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(54) A micro-organism having reduced adaption to a particular environment 



(57) A Microorganism having a reduced adaptation 
to a particular environment identified with a method 
comprising the steps of: (1) providing a plurality of mi- 
croorganisms each of which is independently mutated 
by the insertionat inactivation of a gene with a nucleic 
acid comprising a unique marker sequence so that each 
mutant contains a different marker sequence, or clones 
of the said microorganism; (2) providing individually a 
stored sample of each mutant produced by step (1 ) and 
providing individually stored nucleic acid comprising the 
unique marker sequence from each individual mutant; 

(3) introducing a plurality of mutants produced by step 
(1) into the said particular environment and allowing 
those microorganisms which are able to do so to grow 
in the said environment; (4) retrieving microorganisms 
from the said environment or a selected part thereof and 
isolating the nucleic acid from retrieved microorgan- 
isms; (5) comparing any marker sequences in the nu- 
cleic acid isolated in step (4) to the unique marker se- 
quence of each individual mutant stored as in step (2); 
and (6) selecting an individual mutant which does not 
contain any of the marker sequences as isolated in step 

(4) . Furthermore, the invention relates to: a gene isolat- 
ed from the microorganism, a vaccine comprising the 
microorganism, a polypeptide encoded by the gene, a 
method for identifying compounds interfering with the 
function of the gene e.g. by antisense. The microorgan- 
ism is exemplified with Salmonella typhimurium and its 
VGC2 gene. 
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Description 

The present invention relates to methods tor the identification of genes involved in the adaptation of a microor- 
ganism to its environment, particularly the identification of genes responsible for the virulence of a pathogenic micro- 
5 organism. 

Background to the invention 

Antibiotic resistance in bacterial and other pathogens is becoming increasingly important. It is therefore important 
io to find new therapeutic approaches to attack pathogenic microorganisms. 

Pathogenic microorganisms have to evade the host's defence mechanisms and be able to grow in a poor nutritional 
environment to establish an infection. To do so a number of " virulence " genes of the microorganism are required. 

Virulence genes have been detected using classical genetics and a variety of approaches have been used to 
exploit transposon mutagenesis for the identification of bacterial virulence genes. For example, mutants have been 
75 screened for defined physiological defects, such as the loss of iron regulated proteins (Holland et al, 1 992), or in assays 
to study the penetration ot epithelial cells (Finlay et al. 1 988) and survival within macrophages (Fields et al, 1 989: Miller 
et al, 1989a; Groisman et al, 1989). Transposon mutants have also been tested for altered virulence in live animal 
models of infection (Miller et al, 1989b). This approach has the advantage that genes can be identified which are 
important during different stages of infection, but is severely limited by the need to test a wide range of mutants indi- 
20 vidually for alterations to virulence. Miller et al (1989b) used groups of 8 to 10 mice and infected orally 95 separate 
groups with a different mutant thereby using between 760 and 950 mice. Because of the extremely large numbers of 
animals required : comprehensive screening of a bacterial genome for virulence genes has not been feasible. 

Recently a genetic system (in vivo expression technology [IVET]) was described which positively selects for Sal- 
monella genes that are specifically induced during infection (Mahan et al, 1 993). The technique will identify genes that 
2S are expressed at a particular stage in the infection process. However, it will not identify virulence genes that are reg- 
ulated posttranscriptionally, and more importantly, will not provide information on whether the gene(s) which have been 
identified are actually required for, or contribute to, the infection process. 

Lee & Falkow (1994) Methods Enzymol. 236, 531-545 describe a method of identifying factors influencing the 
invasion of Salmonella into mammalian cells in vitroby isolating hyperinvasive mutants. 
30 Walsh and Cepko (1992) Science 255, 434-440 describe a method of tracking the spatial location of cerebral 

cortical progenitor cells during the development of the cerebral cortex in the rat. The Walsh and Cepko method uses 
a tag that contains a unique nucleic acid sequence and the lacZ gene but there is no indication that useful mutants or 
genes could be detected by their method. 

WO 94/26933 and Smith et al (1995) Proc. Natl. Acad. Sci USA 92, 6479-6483 describe methods aimed at the 
35 identification of the functional regions of a known gene, or at least of a DN A molecule for which some sequence infor- 
mation is available. 

Groisman et a/ (1993) Proc. Natl. Acad. Sci. USA 90,1033-1037 describes the molecular, functional and evolu- 
tionary analysis of sequences specific to Salmonella. 

Some virulence genes are already known for pathogenic microorganisms such as Escherichia coli, Salmonella 
40 typhimurium, Salmonella typhi, Vibrio cholerae, Clostridium botutinum, Yersinia pestis, Shigella flexneri and Listeria 
monocytogenes but in all cases only a relatively small number of the total have been identified. 

The disease which Salmonella typhimurium causes in mice provides a good experimental model of typhoid fever 
(Carter & Collins, 1974). Approximately forty two genes affecting Salmonella virulence have been identified to date 
(Groisman & Ochman, 1994). These represent approximately one third of the total number of predicted virulence genes 
45 (Groisman and Saier, 1990). 

The object of the present invention is to identify genes involved in the adaptation of a microorganism to its envi- 
ronment, particularly to identify further virulence genes in pathogenic microorganisms, with increased efficiency. A 
further object is to reduce the number of experimental animals used in identifying virulence genes. Still further objects 
of the invention provide vaccines, and methods for screening for drugs which reduce virulence. 

so 

Summary of the invention 

A first aspect of the invention provides a method for identifying a microorganism having a reduced adaptation to 
a particular environment comprising the steps of: 

55 

(1) providing a plurality of microorganisms each of which is independently mutated by the insertional inactivation 
of a gene with a nucleic acid comprising a unique marker sequence so that each mutant contains a different marker 
sequence, or clones of the said microorganism; 
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(2) providing individually a stored sample of each mutant produced by step (1) and providing individually stored 
nucleic acid comprising the unique marker sequence from each individual mutant; 

(3) introducing a plurality of mutants produced by step (1) into the said particular environment and allowing those 
microorganisms which are able to do so to grow in the said environment; 

s (4) retrieving microorganisms from the said environment or a selected part thereof and isolating the nucleic acid 

from the retrieved microorganisms; 

(5) comparing any marker sequences in the nucleic acid isolated in step (4) to the unique marker sequence of 
each individual mutant stored as in step (2); and 

(6) selecting an individual mutant which does not contain any of the marker sequences as isolated in step (4). 

70 

Thus ] the method uses negative selection to identify microorganisms with reduced capacity to proliferate in the 
environment. 

A microorganism can live in a number of different environments and it is known that particular genes and their 
products allow the microorganism to adapt to a particular environment. For example, in order for a pathogenic micro- 

75 organism, such as a pathogenic bacterium or pathogenic fungus, to survive in its host the product of one or more 
virulence genes is required. Thus, in a preferred embodiment of the invention a gene of a microorganism which allows 
the microorganism to adapt to a particular environment is a virulence gene. 

Conveniently, the particular environment is a differentiated multicellular organism such as a plant or animal. Many 
bacteria and fungi are known to infect plants and they are able to survive within the plant and cause disease because 

20 of the presence of and expression from virulence genes. Suitable microorganisms when the particular environment is 
a plant include the bacteria Agrobacterium tumefaciens which forms tumours (galls) particularly in grape; Erwinia amy- 
fovara; Pseudomonas solanacearum which causes wilt in a wide range of plants; Rhizobium leguminosarum which 
causes disease in beans; Xanthomonas campestris p. v. citri which causes canker in citrus fruits; and include the fungi 
Magnaporthe or/sea which causes rice blast disease; Fusarium spp. which cause a variety of plant diseases; Erisyphe 

2S spp.; CoHetotrichum gloeospohodes; Gaeumannomyces graminis which causes root and crown diseases in cereals 
and grasses: Glomus spp. : Laccaha spp.; Leptosphaeria maculans; Phoma tracheiphila; Phytophthora spp., Pyreno- 
phora teres; Verticillium alboatrum and V. dahliae;ax\6 Mycosphaerella musicolaand M. fijiensis. As described in more 
detail below, when the microorganism is a fungus a haploid phase to its life cycle is required. 

Similarly, many microorganisms including bacteria, fungi, protozoa and trypanosomes are known to infect animals, 

30 particularly mammals including humans. Survival of the microorganism within the animal and the ability of the micro- 
organism to cause disease is due in large part to the presence of and expression from virulence genes. Suitable bacteria 
include Bordetella spp. particularly B. pertussis, Campylobacter spp. particularly C. jejuni, Clostridium spp. particularly 
C. botulinum. Enterococcus spp. particularly E. faecalis, Escherichia spp. particularly E. coli, Haemophilus spp. par- 
ticularly H. ducreyiand H. influenzae. Helicobacter spp. particularly H. pylori, Klebsiella spp. particularly K. pneumoniae, 

35 Legionella spp, particularly L. pneumophila, Listeric spp. particularly L monocytogenes, Mycobacterium spp. particu- 
larly M. smegmatis and M. tuberculosis, Neisseria spp. particularly N. gonorrhoeae and N. meningitidis, Pseudomonas 
spp., particularly Ps. aeruginosa, Salmonella spp., Shigella spp., Staphylococcus spp. particularly S. aureus, Strepto- 
coccus spp. particularly S. pyogenes and pneumoniae, Vibrio spp. and Yersinia spp. particularly V. pestis. All of these 
bacteria cause disease in man and also there are animal models of the disease. Thus, when these bacteria are used 

40 in the method of the invention, the particular environment is an animal which they can infect and in which they cause 
disease. For example, when Salmonella typhimurium is used to infect a mouse the mouse develops a disease which 
serves as a model for typhoid fever in man. Staphylococcus aureus causes bacteraemia and renal abscess formation 
in mice (Albus et al (1991) Infect. Immun. 59, 1008-1014) and endocarditis in rabbits (Perlman & Freedman (1971) 
Yale J. Biol. Med. 44, 206-213). 

45 It is required that a fungus or higher eukaryotic parasite is haploid for the relevant parts of its life (such as growth 

in the environment). Preferably, a DNA-mediated integrative transformation system is available and, when the micro- 
organism is a human pathogen, conveniently an animal model of the human disease is available. Suitable fungi path- 
ogenic to humans include certain Aspergillus spp. (for example A. fumigatus), Cryptococcus neoformans and Histo- 
plasma capsulatum. Clearly the above-mentioned fungi have a haploid phase and a DNA-mediated integrative trans- 

50 formation systems are available for them. Toxoplasma may also be used, being a parasite with a haploid phase during 
infection. Bacteria have a haploid genome. 

Animal models of human disease are often available in which the animal is a mouse, rat, rabbit, dog or monkey. 
It is preferred if the animal is a mouse. Virulence genes detected by the method of the invention using an animal model 
of a human disease are clearly very likely to be genes which determine the virulence of the microorganism in man. 

55 Particularly preferred microorganisms for use in the methods of the invention are Salmonella typhimurium, Sta- 

phylococcus aureus, Streptococcus pneumoniae, Enterococcus faecalis, Pseudomonas aeruginosa and Aspergillus 
fumigatus. 

A preferred embodiment of the invention is now described. 
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A nucleic acid comprising a unique marker sequence is made as follows. A complex pool of double stranded DNA 
sequence "tags" is generated using oligonucleotide synthesis and a polymerase chain reaction (PCR). Each DNA "tag" 
has a unique sequence of between about 20 and 80 bp, preferably about 40 bp which is flanked by "arms" of about 
15 to 30 bp, preferably about 20 bp, which are common to all "tags". The number of bp in the unique sequence is 
5 sufficient to allow large numbers (for example > 10 10 ) of unique sequences to be generated by random oligonucleotide 
synthesis but not too large to allow the formation of secondary structures which may interfere with a PCR. Similarly 
the length of the arms should be sufficient to allow efficient priming of oligonucleotides in a PCR. 

It is well known that the sequence at the 5' end of the oligonucleotide need not match the target sequence to be 
amplified. 

io It is usual that the PCR primers do not contain any complementary structures with each other longer than 2 bases, 

especially at their 3' ends, as this feature may promote the formation of an artifactual product called "primer dimer". 
When the 3' ends of the two primers hybridize, they form a "primed template" complex, and primer extension results 
in a short duplex product called "primer dimer". 

Internal secondary structure should be avoided in primers. For symmetric PCR, a 40-60% G+C content is often 

*5 recommended for both primers, with no long stretches of any one base. The classical melting temperature calculations 
used in conjunction with DNA probe hybridization studies often predict that a given primer should anneal at a specific 
temperature or that the 72° C extension temperature will dissociate the primer/template hybrid prematurely. In practice, 
the hybrids are more effective in the PCR process than generally predicted by simple T m calculations. 

Optimum annealing temperatures may be determined empirically and may be higher than predicted. Taq DNA 

20 polymerase does have activity in the 37-55°C region, so primer extension will occur during the annealing step and the 
hybrid will be stabilized. The concentrations of the primers are equal in conventional (symmetric) PCR and. typically, 
within 0.1- to 1-u.M range. 

The "tags" are ligated into a transposon or transposon-like element to form the nucleic acid comprising a unique 
marker sequence. Conveniently, the transposon is carried on a suicide vector which is maintained as a plasmid in a 

25 "helper" organism, but which is lost after transfer to the microorganism of the method of the invention. For example, 
the "helper" organism may be a strain of Escherichia coli, the microorganism of the method may be Salmonella and 
the transfer is a conjugal transfer. Although the transposon can be lost after transfer, in a proportion of cells it undergoes 
a transposition event through which it integrates at random, along with its unique tag, into the genome of the microor- 
ganism used in the method. It is most preferred if the transposon or transposon-like element can be selected. For 

30 example, in the case of Salmonella, a kanamycin resistance gene may be present in the transposon and exconjugants 
are selected on medium containing kanamycin. It is also possible to complement an auxotrophic marker in the recipient 
cell with a functional gene in the nucleic acid comprising the unique marker. This method is particularly convenient 
when fungi are used in the method. 

Preferably the complementing functional gene is not derived from the same species as the recipient microorganism, 

35 otherwise non-random integration events may occur. 

It is also particularly convenient if the transposon or transposon-like element is carried on a vector which is main- 
tained episomally (ie not as part of the chromosome) in the microorganism used in the method of the first aspect of 
the invention when in a first given condition whereas, upon changing that condition to a second given condition, the 
episome cannot be maintained permitting the selection of a cell in which the transposon or transposon-like element 

40 has undergone a transposition event through which it integrates at random, along with its unique tag, into the genome 
of the microorganism used in the method. This particularly convenient embodiment is advantageous because, once a 
microorganism carrying the episomal vector is made, then each time the transposition event is selected for or induced 
by changing the condition of the microorganism (or a clone thereof) from a first given condition to a second given 
condition, the transposon can integrate at a different site in the genome of the microorganism. Thus, once a master 

45 collection of microorganisms are made, each member of which contains a unique tag sequence in the transposon or 
transposon-like element carried on the episomal vector (when in the first given condition), it can be used repeatedly 
to generate pools of random insertional mutants, each of which contains a different tag sequence (ie unique within the 
pool). This embodiment is particularly useful because (a) it reduces the number and complexity of manipulations re- 
quired to generate the plurality ("pool") of independently mutated microorganisms in step (1) of the method; and (b) 

50 the number of different tags need only be the same as the number of microorganisms in the plurality of microorganisms 
in step (1) of the method. Point (a) makes the method easier to use in organisms for which transposon mutagenesis 
is more difficult to perform (for example, Staphylococcus aureus) and point (b) means that tag sequences with partic- 
ularly good hybridisation characteristics can be selected therefore making quality control easier. As is described in 
more detail below, the "pool" size is conveniently about 100 or 200 independently-mutated microorganisms and, there- 

55 fore the master collection of microorganisms is conveniently stored in one or two 96-well microtitre plates. 

In a particularly preferred embodiment the first given condition is a first particular temperature or temperature range 
such as 25°C to 32°C, most preferably about 30°C and the second given condition is a second particular temperature 
or temperature range such as 35°C to 45°C, most preferably 42°C, In further preferred embodiments the first given 
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condition is the presence of an antibiotic, such as streptomycin, and the second given condition is the absence of the 
said antibiotic; or the first given condition is the absence of an antibiotic and the second given condition is the presence 
of the said antibiotic. 

Transposons suitable for integration into the genome of Gram negative bacteria include Tn5, Tn10 and derivatives 
5 thereof. Transposons suitable for integration into the genome of Gram positive bacteria include Tn9l6 and derivatives 
or analogues thereof. Transposons particularly suited for use with Staphylococcus aureus include Tn917 (Cheung et 
a/ (1992) Proc. Natl, Acad. Sci. USA 89, 6462-6466) and Tn918 (Albus era/(1991) Infect. Immun. 59, 1008-1014). 

It is particularly preferred if the transposons have the properties of the Tn917 derivatives described by Camilli et 
a/(1990) J. Bacteriol. 172, 3738-3744, and are carried by a temperature-sensitive vector such as pEl94Ts (Villafane 
10 et al (1 987) J. Bacteriol 169, 4822-4829). 

It will be appreciated that although transposons are convenient for insertionally inactivating a gene, any other 
known method, or method developed in the future may be used. A further convenient method of insertionally inactivating 
a gene, particularly in certain bacteria such as Streptococcus, is using insertion-duplication mutagenesis such as that 
described in Morrison et a/(1984) J. Bacteriol 159, 870 with respect to S. pneumoniae. The general method may also 
15 be applied to other microorganisms, especially bacteria. 

For fungi, insertional mutations are created by transformation using DNA fragments or plasmids carrying the "tags" 
and, preferably, selectable markers encoding, for example, resistance to hygromycin Bor phleomycin (see Smith et 
al (1994) Infect. Immunol. 62, 5247-5254). Random, single integration of DNA fragments encoding hygromycin B re- 
sistance into the genome of filamentous fungi, using restriction enzyme mediated integration (REMI; Schiestl & Petes 
20 (1 991 ); Lu et al (1 994) Proa Natl. Acad. Sci. USA 91 , 1 2649-1 2653) are known. 

A simple insertional mutagenesis technique for a fungus is described in Schiestl & Petes (1994) incorporated 
herein by reference, and include, for example, the use of Ty elements and ribosomal DNA in yeast. 

Random integration of the transposon or other DNA sequence allows isolation of a plurality of independently mu- 
tated microorganisms wherein a different gene is insertionally inactivated in each mutant and each mutant contains a 
25 different marker sequence. 

A library of such insertion mutants is arrayed in welled microtitre dishes so that each well contains a different 
mutant microorganism. DNA comprising the unique marker sequence from each individual mutant microorganism (con- 
veniently, the total DNA from the clone is used) is stored. Conveniently, this is done by removing a sample of the 
microorganism from the microtitre dish, spotting it onto a nucleic acid hybridisation membrane (such as nitrocellulose 
30 or nylon membranes), lysing the microorganism in alkali and fixing the nucleic acid to the membrane. Thus, a replica 
of the contents of the welled microtitre dishes is made. 

Pools of the microorganisms from the welled microtitre dish are made and DNA is extracted. This DNA is used as 
a target for a PCR using primers that anneal to the common "arms" flanking the "tags" and the amplified DNA is labelled, 
for example with 32 P. The product of the PCR is used to probe the DNA stored from each individual mutant to provide 
35 a reference hybridisation pattern for the replicas of the welled microtitre dishes. This is a check that each of the individual 
microorganisms does, in fact, contain a marker sequence and that the marker sequence can be amplified and labelled 
efficiently. 

Pools of transposon mutants are made to introduce into the particular environment. Conveniently, 96-well microtitre 
dishes are used and the pool contains 96 transposon mutants. However, the lower limit for the pool is two mutants; 

40 there is no theoretical upper limit to the size of the pool but, as discussed below, the upper limit may be determined in 
relation to the environment into which the mutants are introduced. 

Once the microorganisms are introduced into the said particular environment those microorganisms which are 
able to do so are allowed to grow in the environment. The length of time the microorganisms are left in the environment 
is determined by the nature of the microorganism and the environment. After a suitable length of time, the microorgan- 

45 isms are recovered from the environment, DNA is extracted and the DNA is used as a template for a PCR using primers 
that anneal to the "arms" flanking the "tags". The PCR product is labelled, for example with 32 P, and is used to probe 
the DNA stored from each individual mutant replicated from the welled microtitre dish. Stored DNA are identified which 
hybridise weakly or not at all with the probe generated from the DNA isolated from the microorganisms recovered from 
environment. These non-hybridising DNAs correspond to mutants whose adaptation to the particular environment has 

50 been attenuated by insertion of the transposon or other DNA sequence. 

In a particularly preferred embodiment the "arms" have no, or very little, label compared to the "tags". For example, 
the PCR primers are suitably designed to contain no, or a single, G residue, the 32 P-labelled nucleotide is dCTP and, 
in this case, no or one radiolabeled C residue is incorporated in each "arm" but a greater number of radiolabeled C 
residues are incorporated in the lag". It is preferred if the "tag" has at least ten-fold more label incorporated than the 

55 "arms"; preferably twenty-fold or more; more preferably fifty -fold or more. Conveniently the "arms" can be removed 
from the "tag" using a suitable restriction enzyme, a site for which may be incorporated in the primer design. 

As discussed above, a particularly preferred embodiment of the invention is when the microorganism is a patho- 
genic microorganism and the particular environment is an animal. In this embodiment, the size of the pool of mutants 
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introduced into the animal is determined by (a) the number of cells of each mutant that are likely to survive in the animal 
(assuming a virulence gene has not been inactivated) and (b) the total inoculum of the microorganism. If the number 
in (a) is too low then false positive results may arise and if the number in (b) is too high then the animal may die before 
enough mutants have had a chance to grow in the desired way. The number of cells in (a) can be determined for each 
microorganism used but it is preferably more than 50, more preferably more than 100. 

The number of different mutants that can be introduced into a single animal is preferably between 50 and 500, 
conveniently about 100. It is preferred if the total inoculum does not exceed 10 6 cells (and it is preferably 10 s cells) 
although the size of the inoculum may be varied above or below this amount depending on the microorganism and the 
animal. 

In a particularly convenient method an inoculum of 10 5 is used containing 1000 cells each of 100 different mutants 
for a single animal. It will be appreciated that in this method one animal can be used to screen 1 00 mutants compared 
to prior art methods which require at least 100 animals to screen 100 mutants. 

However, it is convenient to inoculate three animals with the same pool of mutants so that at least two can be 
investigated (one as a replica to check the reliability of the method), whilst the third is kept as a back-up. Nevertheless, 
the method still provides a greater than 30-fold saving in the number of animals used. 

The time between the pool of mutants being introduced into the animal and the microorganisms being recovered 
may vary with the microorganism and animal used. For example, when the animal is a mouse and the microorganism 
is Salmonella typhimurium, the time between inoculation and recovery is about three days. 

In one embodiment of the invention microorganisms are retrieved from the environment in step (5) at a site remote 
from the site of introduction in step (4), so that the virulence genes being investigated include those involved in the 
spread of the microorganism between the two sites. 

For example, in a plant the microorganism may be introduced in a lesion in the stem or at one site on a leaf and 
the microorganism retrieved from another site on the leaf where a disease state is indicated. 

In the case of an animal, the microorganism may be introduced orally, intraperitoneal ly, intravenously or intranasally 
and is retneved at a later time from an internal organ such as the spleen. It may be useful to compare the virulence 
genes identified by oral administration and those identified by intraperitoneal administration as some genes may be 
required to establish infection by one route but not by the other. It is preferred if Salmonella is introduced intraperito- 
neal^. 

Other preferred environments which may be used to identify virulence genes are animal cells in culture (particularly 
macrophages and epithelial cells) and plant cells in culture. Although using cells in culture will be useful in its own right, 
it will also complement the use of the whole animal or plant, as the case may be, as the environment. 

It is also preferred if the environment is a part of the animal body. Within a given host-parasite interaction, a number 
of different environments are possible, including different organs and tissues, and parts thereof such as the Peyer's 
patch. 

The number of individual microorganisms (ie cells) recovered from the environment should be at least twice, pref- 
erably at least ten times, more preferably 100-times the number of different mutants introduced into the environment. 
For example, when an animal is inoculated with 1 00 different mutants around 1 0,000 individual microorganisms should 
be retrieved and their marker DNA isolated. 

A further preferred embodiment comprises the steps: 

(1A) removing auxotrophs from the plurality of mutants produced in step (1); or 
(6A) determining whether the mutant selected in step (6) is an auxotroph; or 
both (1A) and (6A). 

It is desirable to distinguish an auxotroph (that is a mutant microorganism which requires growth factors not needed 
by the wild type or by prototrophs) and a mutant microorganism wherein a gene allowing the microorganism to adapt 
to a particular environment is inactivated. Conveniently, this is done between steps (1) and (2) or after step (6). 

Preferably auxotrophs are not removed when virulence genes are being identified. 

A second aspect of the invention provides a method of identifying a gene which allows a microorganism to adapt 
to a particular environment, the method comprising the method of the first aspect of the invention, followed by the 
additional step: 

(7) isolating the insertionally-inactivated gene or part thereof from the individual mutant selected in step (6). 
Methods for isolating a gene containing a unique marker are well known in the art of molecular biology. 
A further preferred embodiment comprises the further additional step: 

(8) isolating from a wild-type microorganism the corresponding wild-type gene using the insertionally-inactivated 
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gene isolated in step (7) or part thereof as a probe. 

Methods for gene probing are well known in the art of molecular biology. 

Molecular biological methods suitable for use in the practice of the present invention are disclosed in Sambrook 

5 et al (1 989) incorporated herein by reference. 

When the microorganism is a microorganism pathogenic to an animal and the gene is a virulence gene and a 
transposon has been used to insertionally inactivate the gene, it is convenient for the virulence genes to be cloned by 
digesting genomic DNA from the individual mutant selected in step (6) with a restnction enzyme which cuts outside 
the transposon, ligating size-fractionated DNA containing the transposon into a plasmid, and selecting plasmid recom- 

io binants on the basis of antibiotic resistance conferred by the transposon and not by the plasmid. The microorganism 
genomic DNA adjacent to the transposon is sequenced using two primers which anneal to the terminal regions of the 
transposon, and two primers which anneal close to the polylinker sequences of the plasmid. The sequences may be 
subjected to DNA database searches to determine if the transposon has interrupted a known virulence gene. Thus, 
conveniently, sequence obtained by this method is compared against the sequences present in the publicly available 

is databases such as EMBL and GenBank. Finally, if the interrupted sequence appears to be in a new virulence gene, 
the mutation is transferred to a new genetic background (for example by phage P22-mediated transduction in the case 
of Salmonella) and the LD 50 of the mutant strain is determined to confirm that the avirulent phenotype is due to the 
transposition event and not a secondary mutation. 

The number of individual mutants screened in order to detect all of the virulence genes in a microorganism depends 

20 on the number of genes in the genome of the microorganism. For example, it is likely that 3000-5000 mutants of 
Salmonella typhimurium need to be screened in order to detect the majority of virulence genes whereas tor Aspergillus 
spp., which has a larger genome than Salmonella, around 20 000 mutants are screened. Approximately 4% of non- 
essential S. typhimurium genes are thought to be required for virulence (Grossman & Saier. 1990) and, if so, the S. 
typhimurium genome contains approximately 150 virulence genes. However, the methods of the invention provide a 

25 faster, more convenient and much more practicable route to identifying virulence genes. 

A third aspect of the invention provides a microorganism obtained using the method of the first aspect of the 
invention. 

Such microorganisms are useful because they have the property of not being adapted to survive in a particular 
environment. 

30 in a preferred embodiment, a pathogenic microorganism is not adapted to survive in a host organism (environment) 

and, in the case of microorganisms that are pathogenic to animals, particularly mammals, more particularly humans, 
the mutant obtained by the method of the invention may be used in a vaccine. The mutant is avirulent, and therefore 
expected to be suitable for administration to a patient, but it is expected to be antigenic and give rise to a protective 
immune response. 

35 in a further preferred embodiment the pathogenic microorganism not adapted to survive in a host organism, ob- 

tained by the methods of the invention, is modified, preferably by the introduction of a suitable DNA sequence, to 
express an antigenic epitope from another pathogen. This modified microorganism can act as a vaccine for that other 
pathogen. 

A fourth aspect of the invention provides a microorganism comprising a mutation in a gene identified using the 
40 method of the second aspect of the invention. 

Thus, although the microorganism of the third aspect of the invention is useful, it is preferred if a mutation is 
specifically introduced into the identified gene. In a preferred embodiment, particularly when the microorganism is to 
be used in a vaccine, the mutation in the gene is a deletion or a frameshift mutation or any other mutation which is 
substantially incapable of reverting. Such gene-specific mutations can be made using standard procedures such as 
45 introducing into the microorganism a copy of the mutant gene on an autonomous replicon (such as a plasmid or viral 
genome) and relying on homologous recombination to introduce the mutation into the copy of the gene in the genome 
of the microorganism. 

Fifth and sixth aspects of the invention provide a suitable microorganism for use in a vaccine and a vaccine com- 
prising a suitable microorganism and a pharmaceutically-acceptable carrier. 
50 The suitable microorganism is the aforementioned avirulent mutant. 

Active immunisation of the patient is preferred. In this approach, one or more mutant microorganisms are prepared 
in an immunogenic formulation containing suitable adjuvants and carriers and administered to the patient in known 
ways. Suitable adjuvants include Freund's complete or incomplete adjuvant, muramyl dipeptide, the "Iscoms" of EP 
109 942, EP 180 564 and EP 231 039, aluminium hydroxide, saponin, DEAE-dextran, neutral oils (such as miglyol), 
55 vegetable oils (such as arachis oil), liposomes, Pluronic polyols or the Ribi adjuvant system (see, for example GB-A- 
2 189 141). "Pluronic" is a Registered Trade Mark. The patient to be immunised is a patient requiring to be protected 
from the disease caused by the virulent form of the microorganism. 

The aforementioned avirulent microorganisms of the invention or a formulation thereof may be administered by 
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any conventional method including oral and parenteral (eg subcutaneous or intramuscular) injection. The treatment 
may consist of a single dose or a plurality of doses over a period of time. 

Whilst it is possible for an avirulent microorganism of the invention to be administered alone, it is preferable to 
present it as a pharmaceutical formulation, together with one or more acceptable carriers. The carriers) must be 
s "acceptable" in the sense of being compatible with the avirulent microorganism of the invention and not deleterious to 
the recipients thereof. Typically, the carriers will be water or saline which will be sterile and pyrogen free. 

It will be appreciated that the vaccine of the invention, depending on its microorganism component, may be useful 
in the fields of human medicine and veterinary medicine. 

Diseases caused by microorganisms are known in many animals, such as domestic animals. The vaccines of the 
10 invention, when containing an appropriate avirulent microorganism, particularly avirulent bacterium, are useful in man 
but also in, for example, cows, sheep, pigs, horses, dogs and cats, and in poultry such as chickens, turkeys, ducks 
and geese. 

Seventh and eighth aspects of the invention provide a gene obtained by the method of the second aspect of the 
invention, and a polypeptide encoded thereby. 
*5 By "gene" we include not only the regions of DNA that code for a polypeptide but also regulatory regions of DNA 

such as regions of DNA that regulate transcription, translation and, for some microorganisms, splicing of RNA. Thus, 
the gene includes promoters, transcription terminators, ribosome-binding sequences and for some organisms introns 
and splice recognition sites. 

Typically, sequence information of the inactivated gene obtained in step 7 is derived. Conveniently, sequences 
20 close to the ends of the transposon are used as the hybridisation site of a sequencing primer. The derived sequence 
or a DNA restriction fragment adjacent to the inactivated gene itself is used to make a hybridisation probe with which 
to identify, and isolate from a wild-type organism, the corresponding wild type gene. 

It is preferred if the hybridisation probing is done under stringent conditions to ensure that the gene, and not a 
relative, is obtained. By "stringent" we mean that the gene hybridises to the probe when the gene is immobilised on a 
25 membrane and the probe (which, in this case is > 200 nucleotides in length) is in solution and the immobilised gene/ 
hybridised probe is washed in 0.1 x SSC at 65°C for 10 min. SSC is 0.15 M NaCI/0.015 M Na citrate. 

Preferred probe sequences for cloning Salmonella virulence genes are shown in Figures 5 and 6 and described 
in Example 2. 

In a particularly preferred embodiment the Salmonella virulence genes comprise the sequence shown in Figures 

30 5 and 6 and described in Example 2. 

In further preference the genes are those contained within, or at least part of which is contained within, the se- 
quences shown in Figures 11 and 12 and which have been identified by the method of the second aspect of the 
invention. The sequences shown in Figures 11 and 12 are part of a gene cluster from Salmonella typhimurium which 
I have called virulence gene cluster 2 (VGC2). The position of transposon insertions are indicated within the sequence, 

35 and these transposon insertions inactivate a virulence determinant of the organism. As is discussed more fully below 
and in particular in Example 4. when the method of the second aspect of the invention is used to identify virulence 
genes in Salmonella typhimuhum, many of the nucleic acid insertions (and therefore genes identified) are clustered in 
a relatively small part of the genome. This region, VGC2, contains other virulence genes which, as described below, 
form part of the invention. 

^o The gene isolated by the method of the invention can be expressed in a suitable host cell. Thus, the gene (DNA) 

may be used in accordance with known techniques, appropriately modified in view of the teachings contained herein, 
to construct an expression vector, which is then used to transform an appropriate host cell for the expression and 
production of the polypeptide of the invention. Such techniques include those disclosed in US Patent Nos. 4,440,859 
issued 3 April 1984 to Rutter et a\, 4,530,901 issued 23 July 1985 to Weissman, 4,582,800 issued 15 April 1986 to 

45 Crowt, 4,677,063 issued 30 June 1987 to Mark et at, 4,678,751 issued 7 July 1987 to Goeddel, 4,704,362 issued 3 
November 1987 to Itakura etal, 4,710,463 issued 1 December 1987 to Murray, 4,757,006 issued 12July 1988 to Toole, 
Jr. etal, 4,766,075 issued 23 August 1988 to Goeddel et a/ and 4,810,648 issued 7 March 1989 to Stalker, all of which 
are incorporated herein by reference. 

The DNA encoding the polypeptide constituting the compound of the invention may be joined to a wide variety of 

50 other DNA sequences for introduction into an appropriate host. The companion DNA will depend upon the nature of 
the host, the manner of the introduction of the DNA into the host, and whether episomal maintenance or integration is 
desired. 

Generally, the DNA is inserted into an expression vector, such as a plasmid, in proper orientation and correct 
reading frame for expression, if necessary, the DNA may be linked to the appropriate transcriptional and translational 
55 regulatory control nucleotide sequences recognised by the desired host, although such controls are generally available 
in the expression vector. The vector is then introduced into the host through standard techniques. Generally, not all of 
the hosts will be transformed by the vector. Therefore, it will be necessary to select for transformed host cells. One 
selection technique involves incorporating into the expression vector a DNA sequence, with any necessary control 
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elements, that codes for a selectable trait in the transformed cell, such as antibiotic resistance. Alternatively, the gene 
for such selectable trait can be on another vector, which is used to co-transform the desired host cell. 

Host cells that have been transformed by the recombinant DNA of the invention are then cultured for a sufficient 
time and under appropriate conditions known to those skilled in the art in view of the teachings disclosed herein to 
5 permit the expression of the polypeptide, which can then be recovered. 

Many expression systems are known, including bacteria (for example £ coli and Bacillus subtilis), yeasts (for 
example Saccharomyces cerevisiae), filamentous fungi (for example Aspergillus), plant cells, animal cells and insect 
cells. 

The vectors include a prokaryotic replicon, such as the ColE1 oh, for propagation in a prokaryote, even if the vector 
10 is to be used for expression in other, non-prokaryotic, cell types. The vectors can also include an appropriate promoter 
such as a prokaryotic promoter capable of directing the expression (transcription and translation) of the genes in a 
bacterial host cell, such as £ coli, transformed therewith. 

A promoter is an expression control element formed by a DNA sequence that permits binding of RNA polymerase 
and transcription to occur Promoter sequences compatible with exemplary bacterial hosts are typically provided in 
15 plasmid vectors containing convenient restriction sites for insertion of a DNA segment of the present invention. 

Typical prokaryotic vector plasmids are pUC18, pUCl9, pBR322 and pBR329 available from Biorad Laboratories, 
(Richmond, CA, USA) and pTrc99Aand pKK223-3 available from Pharmacia, Piscataway, NJ, USA. 

A typical mammalian cell vector plasmid is pSVL available from Pharmacia, Piscataway, NJ, USA. This vector 
uses the SV40 late promoter to drive expression of cloned genes, the highest level of expression being found in T 
20 antigen-producing cells, such as COS-1 cells. 

An example of an inducible mammalian expression vector is pMSG, also available from Pharmacia. This vector 
uses the glucocorticoid-inducible promoter of the mouse mammary tumour virus long terminal repeat to drive expression 
of the cloned gene. 

Useful yeast plasmid vectors are pRS403-406 and pRS41 3-41 6 and are generally available from Stratagene Clon- 
es ing Systems, La Jolla, CA 92037, USA. Plasmids pRS403, pRS404, pRS405 and pRS406 are Yeast Integrating plas- 
mids (Yips) and incorporate the yeast selectable markers HIS3, TRP1, LEU2 and URA3. Plasmids pRS413-416 are 
Yeast Centromere plasmids (YCps) 

A variety of methods have been developed to operably link DNA to vectors via complementary cohesive termini. 
For instance, complementary homopolymer tracts can be added to the DNA segment to be inserted to the vector DNA. 
30 The vector and DNA segment are then joined by hydrogen bonding between the complementary homopolymeric tails 
to form recombinant DNA molecules. 

Synthetic linkers containing one or more restriction sites provide an alternative method of joining the DNA segment 
to vectors. The DNA segment, generated by endonuclease restriction digestion as described earlier, is treated with 
bacteriophage T4 DNA polymerase or £ coli DNA polymerase I, enzymes that remove protruding, 3'-single-stranded 
35 termini with their S'-S'-exonucleo lytic activities, and fill in recessed 3'-ends with their polymerizing activities. 

The combination of these activities therefore generates blunt-ended DNA segments. The blunt-ended segments 
are then incubated with a large molar excess of linker molecules in the presence of an enzyme that is able to catalyze 
the ligation of blunt-ended DNA molecules, such as bacteriophage T4 DNA ligase. Thus, the products of the reaction 
are DNA segments carrying polymeric linker sequences at their ends. These DNA segments are then cleaved with the 
40 appropriate restriction enzyme and ligated to an expression vector that has been cleaved with an enzyme that produces 
termini compatible with those of the DNA segment. 

Synthetic linkers containing a variety of restnction endonuclease sites are commercially available from a number 
of sources including International Biotechnologies Inc, New Haven, CN, USA. 

A desirable way to modify the DNA encoding the polypeptide of the invention is to use the polymerase chain 
45 reaction as disclosed by Saiki el al (1 988) Science 239, 487-491 . 

In this method the DNA to be enzymatically amplified is flanked by two specific oligonucleotide primers which 
themselves become incorporated into the amplified DNA. The said specific primers may contain restriction endonu- 
clease recognition sites which can be used for cloning into expression vectors using methods known in the art. 

Variants of the genes also form part of the invention. It is preferred if the variant has at least 70% sequence identity, 
50 more preferably at least 85% sequence identity, most preferably at least 95 % sequence identity with the genes isolated 
by the method of the invention. Of course, replacements, deletions and insertions may be tolerated. The degree of 
similarity between one nucleic acid sequence and another can be determined using the GAP program of the University 
of Wisconsin Computer Group. 

Similarly, variants of proteins encoded by the genes are included. 
55 By "variants" we include insertions, deletions and substitutions, either conservative or non-conservative, where 

such changes do not substantially alter the normal function of the protein. 

By "conservative substitutions" is intended combinations such as Gly, Ala; Val, lie, Leu; Asp, Glu; Asn, Gin; Ser, 
Thr; Lys, Arg; and Phe, Tyr. 
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Such variants may be made using the well known methods ol protein engineering and site-directed mutagenesis. 

A ninth aspect of the invention provides a method of identifying a compound which reduces the ability of a micro- 
organism to adapt to a particular environment comprising the steps of selecting a compound which interferes with the 
function of (1) a gene obtained by the method of the second aspect of the invention or of (2) a polypeptide encoded 
5 by such a gene. 

Pairwise screens for compounds which affect the wild type cell but not a cell overproducing a gene isolated by the 
methods of the invention form part of this aspect of the invention. 

For example, in one embodiment one cell is a wild type cell and a second cell is the Salmonella which is made to 
overexpress the gene isolated by the method of the invention. The viability and/or growth of each cell in the particular 
10 environment is determined in the presence of a compound to be tested to identify which compound reduces the viability 
or growth of the wild type cell but not the cell overexpressing the said gene. 

It is preferred if the gene is a virulence gene. 

For example, in one embodiment the microorganism (such as S. typhimurium) is made to over-express the viru- 
lence gene identified by the method of the first aspect of the invention. Each of (a) the "over-expressing" microorganism 

f5 and (b) an equivalent microorganism (which does not over-express the virulence gene) are used to infect cells in 
culture. Whether a particular test compound will selectively inhibit the virulence gene function is determined by as- 
sessing the amount of the test compound which is required to prevent infection of the host cells by (a) the over-ex- 
pressing microorganism and (b) the equivalent microorganism (at least for some virulence gene products it is envisaged 
that the test compound will inactivate them, and itself be inactivated, by binding to the virulence gene product). If more 

20 of the compound is required to prevent infection by the (a) than (b) then this suggests that the compound is selective. 
It is preferred if the microorganisms (such as Salmonella) are destroyed extracellularly by a mild antibiotic such as 
gentamicin (which does not penetrate host cells) and that the effect of the test compound in preventing infection of the 
cell by the microorganism is by lysing the said cell and determining how many microorganisms are present (for example 
by plating on agar). 

25 Pairwise screens and other screens for compounds are generally disclosed in Kirsch & Di Domenico (1993) in 

"The Discovery of Natural Products with a Therapeutic Potential" (Ed, V.P. Gallo), Chapter 6, pages 177-221, Butter- 
worths, V.K. (incorporated herein by reference). 

Pairwise screens can be designed in a number of related formats in which the relative sensitivity to a compound 
is compared using two genetically related strains. If the strains differ at a single locus, then a compound specific for 

30 that target can be identified by comparing each strain's sensitivity to the inhibitor. For example, inhibitors specific to 
the target will be more active against a super-sensitive test strain when compared to an otherwise isogenic sister strain. 
In an agar diffusion format, this is determined by measuring the size of the zone of inhibition surrounding the disc or 
well carrying the compound. Because of diffusion, a continuous concentration gradient of compound is set up, and the 
strain's sensitivity to inhibitors is proportional to the distance from the disc or well to the edge of the zone. General 

35 antimicrobials, or antimicrobials with modes of action other than the desired one are generally observed as having 
similar activities against the two strains. 

Another type of molecular genetic screen, involving pairs of strains where a cloned gene product is overexpressed 
in one strain compared to a control strain. The rationale behind this type of assay is that the strain containing an 
elevated quantity of the target protein should be more resistant to inhibitors specific to the cloned gene product than 

40 an isogenic strain, containing normal amounts of the target protein. In an agar diffusion assay, the zone size surrounding 
a specific compound is expected to be smaller in the strain overexpressing the target protein compared to an otherwise 
isogenic strain. 

Additionally or alternatively selection of a compound is achieved in the following steps: 

45 1. A mutant microorganism obtained using the method of the first aspect' of the invention is used as a control (it 

has a given phenotype, for example, avirulence). 

2. A compound to be tested is mixed with the wild-type microorganism. 

so 3. The wild-type microorganism is introduced into the environment (with or without the test compound). 

4. If the wild-type microorganism is unable to adapt to the environment (following treatment by, or in the presence 
of, the compound), the compound is one which reduces the ability of the microorganism to adapt to, or survive in, 
the particular environment. 

55 

When the environment is an animal body and the microorganism is a pathogenic microorganism, the compound 
identified by this method can be used in a medicament to prevent or ameliorate infection with the microorganism. 
A tenth aspect of the invention therefore provides a compound identifiable by the method of the ninth aspect. 
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It will be appreciated that uses of the compound of the tenth aspect are related to the method by which it can be 
identified, and in particular in relation to the host of a pathogenic microorganism. For example, if the compound is 
identifiable by a method which uses a virulence gene, or polypeptide encoded thereby, from a bacterium which infects 
a mammal, the compound may be useful in treating infection of a mammal by that bacterium. 
s Similarly, if the compound is identifiable by a method which uses a virulence gene, or polypeptide encoded thereby, 

from a fungus which infects a plant, the compound may be useful in treating infection of a plant by that fungus. 

An eleventh aspect of the invention provides a molecule which selectively interacts with, and substantially inhibits 
the function of, a gene of the seventh aspect of the invention or a nucleic acid product thereof. 

By "nucleic acid product thereof" we include any RNA, especially mRNA, transcribed from the gene. 
10 Preferably a molecule which selectively interacts with, and substantially inhibits the function of, said gene or said 

nucleic acid product is an antisense nucleic acid or nucleic acid derivative. 

More preferably, said molecule is an antisense oligonucleotide. 

Antisense oligonucleotides are single-stranded nucleic acid, which can specifically bind to a complementary nucleic 
acid sequence. By binding to the appropriate target sequence, an RNA-RNA, a DNA-DNA, or RNA-DNA duplex is 

75 formed. These nucleic acids are often termed "antisense" because they are complementary to the sense or coding 
strand of the gene. Recently, formation of a triple helix has proven possible where the oligonucleotide is bound to a 
DNA duplex. It was found that oligonucleotides could recognise sequences in the major groove of the DNA double 
helix. A triple helix was formed thereby. This suggests that it is possible to synthesise sequence-specific molecules 
which specifically bind double-stranded DNA via recognition of major groove hydrogen binding sites. 

20 Clearly, the sequence of the antisense nucleic acid or oligonucleotide can readily be determined by reference to 

the nucleotide sequence of the gene in question. For example, antisense nucleic acid or oligonucleotides can be de- 
signed which are complementary to a part of the sequence shown in Figures 11 or 12, especially to sequences which 
form a part of a virulence gene. 

Oligonucleotides are subject to being degraded or inactivated by cellular endogenous nucleases. To counter this 

2$ problem, it is possible to use modified oligonucleotides, eg having altered internucleotide linkages, in which the naturally 
occurring phosphodiester linkages have been replaced with another linkage. For example, Agrawal efa/(1988) Proc. 
Natl. Acad. Sci. USA 85, 7079-7083 showed increased inhibition in tissue culture of HI V-1 using oligonucleotide phos- 
phoramidates and phosphorothioates. Sarin et al (1988) Proc. Natl. Acad. ScL USA 85, 7448-7451 demonstrated 
increased inhibition of HIV-1 using oligonucleotide methylphosphonates. Agrawal et a/ (1989) Proc. Natl, Acad. Sci. 

30 USA 86, 7790-7794 showed inhibition of HIV-1 replication in both early-infected and chronically infected cell cultures, 
using nucleotide sequence-specific oligonucleotide phosphorothioates. Leither ef a/(1990) Proc. Natl. Acad. Sci. USA 
87, 3430-3434 report inhibition in tissue culture of influenza virus replication by oligonucleotide phosphorothioates. 

Oligonucleotides having artificial linkages have been shown to be resistant to degradation in vivo. For example, 
Shaw efa/(1991) in Nucleic Acids Res. 19, 747-750, report that otherwise unmodified oligonucleotides become more 

35 resistant to nucleases in vivo when they are blocked at the 3' end by certain capping structures and that uncapped 
oligonucleotide phosphorothioates are not degraded in vivo. 

A detailed description of the H-phosphonate approach to synthesizing oligonucleoside phosphorothioates is pro- 
vided in Agrawal and Tang (1990) Tetrahedron Letters 31, 7541-7544, the teachings of which are hereby incorporated 
herein by reference. Syntheses of oligonucleoside methylphosphonates, phosphorodithioates, phosphoramidates, 

40 phosphate esters, bridged phosphoramidates and bridge phosphorothioates are known in the art. See, for example, 
Agrawal and Goodchild (1987) Tetrahedron Letters 28, 3539; Nielsen efa/(1988) Tetrahedron Letters 29, 2911; Jager 
efa/(1988) Biochemistry 27, 7237; Uznanski efa/(1987) Tetrahedron Letters 28, 3401; Bannwarth (1988) Helv. Chim. 
Acta. 71, 1517; Crosstick and Vyle (1989) Tetrahedron Letters 30, 4693; Agrawal era/ (1990) Proc. Natl. Acad. Sci. 
USA 87. 1401-1405, the teachings of which are incorporated herein by reference. Other methods for synthesis or 

45 production also are possible. In a preferred embodiment the oligonucleotide is a deoxyribonucleic acid (DNA), although 
ribonucleic acid (RNA) sequences may also be synthesized and applied. 

The oligonucleotides useful in the invention preferably are designed to resist degradation by endogenous nucle- 
olytic enzymes. In vivo degradation of oligonucleotides produces oligonucleotide breakdown products of reduced 
length. Such breakdown products are more likely to engage in non-specific hybridization and are less likely to be 

so effective, relative to their full-length counterparts. Thus, it is desirable to use oligonucleotides that are resistant to 
degradation in the body and which are able to reach the targeted cells. The present oligonucleotides can be rendered 
more resistant to degradation in vivo by substituting one or more internal artificial internucleotide linkages for the native 
phosphodiester linkages, for example, by replacing phosphate with sulphur in the linkage. Examples of linkages that 
may be used include phosphorothioates, methylphosphonates, sulphone, sulphate, ketyl, phosphorodithioates, various 

55 phosphoramidates, phosphate esters, bridged phosphorothioates and bridged phosphoramidates. Such examples are 
illustrative, rather than limiting, since other internucleotide linkages are known in the art. See, for example, Cohen, 
(1990) Trends in Biotechnology. The synthesis of oligonucleotides having one or more of these linkages substituted 
for the phosphodiester internucleotide linkages is well known in the art, including synthetic pathways for producing 
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oligonucleotides having mixed intern ucleotide linkages. 

Oligonucleotides can be made resistant to extension by endogenous enzymes by "capping" or incorporating similar 
groups on the 5' or 3' terminal nucleotides. A reagent for capping is commercially available as Amino-Link II™ from 
Applied BioSystems Inc, Foster City, CA. Methods for capping are described, for example, by Shaw e/a/(1 991 ) Nucleic 

5 Acids Res. 1 9, 747-750 and Agrawal et al (1 991 ) Proc. Natl. Acad. Sci. USA 88( 1 7), 7595-7599, the teachings of which 
are hereby incorporated herein by reference. 

A further method of making oligonucleotides resistant to nuclease attack is for them to be "self-stabilized" as 
described by Tang et al (1993) Nucl. Acids Res. 21, 2729-2735 incorporated herein by reference. Self-stabilized oli- 
gonucleotides have hairpin loop structures at their 3' ends, and show increased resistance to degradation by snake 

10 venom phosphodiesterase, DNA polymerase I and fetal bovine serum. The self-stabilized region of the oligonucleotide 
does not interfere in hybridization with complementary nucleic acids, and pharmacokinetic and stability studies in mice 
have shown increased in vivo persistence of self-stabilized oligonucleotides with respect to their linear counterparts. 

In accordance with the invention, the inherent binding specificity of antisense oligonucleotides characteristic of 
base pairing is enhanced by limiting the availability of the antisense compound to its intend locus in vivo, permitting 

is lower dosages to be used and minimizing systemic effects. Thus, oligonucleotides are applied locally to achieve the 
desired effect. The concentration of the oligonucleotides at the desired locus is much higher than if the oligonucleotides 
were administered systemically, and the therapeutic effect can be achieved using a significantly lower total amount. 
The local high concentration of oligonucleotides enhances penetration of the targeted cells and effectively blocks trans- 
lation of the target nucleic acid sequences. 

20 The oligonucleotides can be delivered to the locus by any means appropriate for localized administration of a drug. 

For example, a solution of the oligonucleotides can be injected directly to the site or can be delivered by infusion using 
an infusion pump. The oligonucleotides also can be incorporated into an implantable device which when placed at the 
desired site, permits the oligonucleotides to be released into the surrounding locus. 

The oligonucleotides are most preferably administered via a hydrogel material. The hydrogel is noninflammatory 

25 and biodegradable. Many such materials now are known, including those made from natural and synthetic polymers. 
In a preferred embodiment, the method exploits a hydrogel which is liquid below body temperature but gels to form a 
shape-retaining semisolid hydrogel at or near body temperature. Preferred hydrogel are polymers of ethylene oxide- 
propylene oxide repeating units. The properties of the polymer are dependent on the molecular weight of the polymer 
and the relative percentage of polyethylene oxide and polypropylene oxide in the polymer. Preferred hydrogels contain 

30 from about 10 to about 80% by weight ethylene oxide and from about 20 to about 90% by weight propylene oxide. A 
particularly preferred hydrogel contains about 70% polyethylene oxide and 30% polypropylene oxide. Hydrogels which 
can be used are available, for example, from BASF Corp., Parsippany, NJ, under the tradename Pluronic R . 

In this embodiment, the hydrogel is cooled to a liquid state and the oligonucleotides are admixed into the liquid to 
a concentration of about 1 mg oligonucleotide per gram of hydrogel. The resulting mixture then is applied onto the 

35 surface to be treated, for example by spraying or painting during surgery or using a catheter or endoscopic procedures. 
As the polymer warms, it solidifies to form a gel, and the oligonucleotides diffuse out of the gel into the surrounding 
cells over a period of time defined by the exact composition of the gel. 

The oligonucleotides can be administered by means of other implants that are commercially available or described 
in the scientific literature, including liposomes, microcapsules and implantable devices. For example, implants made 

40 of biodegradable materials such as polyanhydrides, polyorthoesters, polylactic acid and polyglycolic acid and copoly- 
mers thereof, collagen, and protein polymers, or non-biodegradable materials such as ethylenevinyl acetate (EVAc), 
polyvinyl acetate, ethylene vinyl alcohol, and derivatives thereof can be used to locally deliver the oligonucleotides. 
The oligonucleotides can be incorporated into the material as it is polymerized or solidified, using melt or solvent 
evaporation techniques, or mechanically mixed with the material. In one embodiment, the oligonucleotides are mixed 

45 into or applied onto coatings for implantable devices such as dextran coated silica beads, stents, or catheters. 

The dose of oligonucleotides is dependent on the size of the oligonucleotides and the purpose for which is it 
administered. In general, the range is calculated based on the surface area of tissue to be treated. The effective dose 
of oligonucleotide is somewhat dependent on the length and chemical composition of the oligonucleotide but is gen- 
erally in the range of about 30 to 3000 u,g per square centimetre of tissue surface area. 

so The oligonucleotides may be administered to the patient systemically for both therapeutic and prophylactic pur- 

poses. The oligonucleotides may be administered by any effective method, for example, parenterally (eg intravenously, 
subcutaneously, intramuscularly) or by oral, nasal or other means which permit the oligonucleotides to access and 
circulate in the patient's bloodstream. Oligonucleotides administered systemically preferably are given in addition to 
locally administered oligonucleotides, but also have utility in the absence of local administration. A dosage in the range 

55 of from about 0.1 to about 10 grams per administration to an adult human generally will be effective for this purpose. 

It will be appreciated that the molecules of this aspect of the invention are useful in treating or preventing any 
infection caused by the microorganism from which the said gene has been isolated, or a close relative of said micro- 
organism. Thus, the said molecule is an antibiotic. 
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Thus, a twelfth aspect of the invention provides a molecule of the eleventh aspect of the invention for use in 
medicine. 

A thirteenth aspect of the invention provides a method of treating a host which has, or is susceptible to, an infection 
with a microorganism, the method comprising administering an effective amount of a molecule according to the eleventh 
5 aspect of the invention wherein said gene is present in said microorganisms, or a close relative of said microorganism. 

By "effective amount" we mean an amount which substantially prevents or ameliorates the infection. By "host" we 
include any animal or plant which may be infected by a microorganism. 

It will be appreciated that pharmaceutical formulations of the molecule of the eleventh aspect of the invention form 
part of the invention. Such pharmaceutical formulations comprise the said molecule together with one or more accept- 
10 able carriers. The carrier(s) must be "acceptable" in the sense of being compatible with the said molecule of the in- 
vention and not deleterious to the recipients thereof. Typically, the carriers will be water or saline which will be sterile 
and pyrogen free. 

As mentioned above, and as described in more detail in Example 4 below, I have found that certain virulence genes 
are clustered in Salmonella typhimurium in a region of the chromosome that I have called VGC2. DNA-DNA hybridi- 

'5 sation experiments have determined that sequences homologous to at least part of VGC2 are found in many species 
and strains of Salmonella but are not present in the E. co// and Shigella strains tested (see Example 4). These se- 
quences almost certainly correspond to conserved genes, at least in Salmonella, and at least some of which are vir- 
ulence genes. It is believed that equivalent genes in other Salmonella species and, if present, equivalent genes in 
other enteric or other bacteria will also be virulence genes. 

20 Whether a gene within the VGC2 region is a virulence gene is readily determined. For example, those genes within 

VGC2 which have been identified by the method of the second aspect of the invention (when applied to Salmonella 
typhimurium and wherein the environment is an animal such as a mouse) are virulence genes. Virulence genes are 
also identified by making a mutation in the gene (preferably a non-polar mutation) and determining whether the mutant 
strain is avirulent. Methods of making mutations in a selected gene are well known and are described below. 

25 A fourteenth aspect of the invention provides the VGC2 DNA of Salmonella typhimurium or a part thereof, or a 

variant of said DNA or a variant of a part thereof. 

The VGC2 DNA of Salmonella typhimurium is depicted diagrammatical ly in Figure 8 and is readily obtainable from 
Salmonella typhimurium ATCC 14028 (available from the American Type Culture Collection, 12301 Parklawn Drive, 
Rockville, Maryland 20852, USA; also deposited at the NCTC, Public Health Laboratory Service, Colindale, UK under 

30 accession no. NCTC 12021) using the information provided in Example 4. For example, probes derived from the se- 
quences shown in Figures 11 and 1 2 may be used to identify X clones from a Salmonella typhimurium genomic library. 
Standard genome walking methods can be employed to obtain all of the VGC2 DNA. The restriction map shown in 
Figure 8 can be used to identify and locate DNA fragments from VGC2. 

By "part of the VGC2 DNA of Salmonella typhimurium" we mean any DNA sequence which comprises at least 10 

35 nucleotides, preferably at least 20 nucleotides, more preferably at least 50 nucleotides, still more preferably at least 
100 nucleotides, and most preferably at least 500 nucleotides of VGC2. A particularly preferred part of the VGC2 DNA 
is the sequence shown in Figure 11, or a part thereof. Another particularly preferred part of the VGC2 DNA is the 
sequence shown in Figure 12, or a part thereof. 

Advantageously, the part of the VGC2 DNA is a gene, or part thereof. 

40 Genes can be identified within the VGC2 region by statistical analysis of the open reading frames using computer 
programs known in the art. If an open reading frame is greater than about 100 codons it is likely to be a gene (although 
genes smaller than this are known). Whether an open reading frame corresponds to the polypeptide coding region of 
a gene can be determined experimentally. For example, a part of the DNA corresponding to the open reading frame 
may be used as a probe in a northern (RNA) blot to determine whether mRNA is expressed which hybridises to the 

45 said DNA; alternatively or additionally a mutation may be introduced into the open reading frame and the effect of the 
mutation on the phenotype of the microorganism can be determined. If the phenotype is changed then the open reading 
frame corresponds to a gene. Methods of identifying genes within a DNA sequence are known in the art. 

By "variant of said DNA or a variant of a part thereof we include any variant as defined by the term "variant" in 
the seventh aspect of the invention. 

50 Thus, variants of VGC2 DNA of Salmonella typhimurium include equivalent genes, or parts thereof, from other 

Salmonella species, such as Salmonella typhi and Salmonella enterica t as well as equivalent genes, or parts thereof, 
from other bacteria such as other enteric bacteria. 

By "equivalent gene" we include genes which are functionally equivalent and those in which a mutation leads to 
a similar phenotype (such as avirulence). It will be appreciated that before the present invention VGC2 or the genes 

55 contained therein had not been identified and certainly not implicated in virulence determination. 

Thus, further aspects of the invention provide a mutant bacterium wherein if the bacterium normally contains a 
gene that is the same as or equivalent to a gene in VGC2, said gene is mutated or absent in said mutant bacterium; 
methods of making a mutant bacterium wherein if the bacterium normally contains a gene that is the same as or 
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equivalent to a gene in VGC2, said gene is mutated or absent in said mutant bacterium. The following is a preferred 
method to inactivate a VGC2 gene. One first subclones the gene on a DNA fragment from a Salmonella X DNA library 
or other DNA library using a fragment of VGC2 as a probe in hybridisation experiments, and map the gene with respect 
to restriction enzyme sites and characterise the gene by DNA sequencing in Escherichia coli. Using restriction enzymes, 

5 one then introduces into the coding region of the gene a segment of DNA encoding resistance to an antibiotic (for 
example, kanamycin), possibly after deleting a portion of the coding region of the cloned gene by restriction enzymes. 
Methods and DNA constructs containing an antibiotic resistance marker are available to ensure that the inactivation 
of the gene of interest is preferably non-polar, that is to say, does not affect the expression of genes downstream from 
the gene of interest. The mutant version of the gene is then transferred from E. coliXo Salmonella typhimurium usiing 

10 phage P22 transduction and transductants checked by Southern hybridisation for homologous recombination of the 
mutant gene into the chromosome. 

This approach is commonly used in Salmonella (and can be used in S. typhi), and further details can be found in 
many papers, including Galan et a/ (1992) 174, 4338-4349. 

Still further aspects provide a use of said mutant mutant bacterium in a vaccine; pharmaceutical compositions 

is comprising said bacterium and a pharmaceutical^ acceptable carrier; a polypeptide encoded by VGC2 DNA of Sal- 
monella typhimurium or a part thereof, or a variant of a part thereof; a method of identifying a compound which reduces 
the ability of a bacterium to infect or cause disease in a host; a compound identifiable by said method; a molecule 
which selectively interacts with, and substantially inhibits the function of, a gene in VGC2 or a nucleic product thereof; 
and medical uses and pharmaceutical compositions thereof. 

20 The VGC2 DNA contains genes which have been identified by the methods of the first and second aspects of the 

invention as well as genes which have been identified by their location (although identifiable by the methods of the first 
and second aspects of the invention). These further aspects of the invention relate closely to the fourth, fifth, sixth, 
seventh, eighth, ninth, tenth, eleventh, twelfth and thirteenth aspects of the invention and, accordingly, the information 
given in relation to those aspects, and preferences expressed in relation to those aspects, applies to these further 

25 aspects. 

It is preferred if the gene is from VGC2 or is an equivalent gene from another species of Salmonella such as S. 
typhi. It is preferred if the mutant bacterium is a S. typhimurium mutant or a mutant of another species of Salmonella 
such as S. typhi. 

It is believed that at least some of the genes in VGC2 confer the ability for the bacterium, such as S. typhimurium, 
30 to enter cells. 

The invention will now be described with reference to the following Examples and Figures wherein: 
Figure 1 illustrates diagrammatically one particularly preferred method of the invention. 

Figure 2 shows a Southern hybridisation analysis of DNA from 1 2 S. typhimurium exconjugants following digestion 
with EcoRV. The filter was probed with the kanamycin resistance gene of the mini-Tn5transposon. 
35 Figure 3 shows a colony blot hybridisation analysis of DNA from 48 S. typhimurium exconjugants from a half of a 

microtitre dish (A1 -H6). The filter was hybridised with a probe comprising labelled amplified tags from DNA isolated 
from a pool of the first 24 colonies (A1 -D6). 

Figure 4 shows a DNA colony blot hybridisation analysis of 95 S. typhimurium exconjugants of a microtitre dish 
(A1 -H1 1 ), which were injected into a mouse. Replicate filters were hybridised with labelled amplified tags from the pool 
40 (inoculum pattern), or with labelled amplified tags from DNA isolated from over 10,000 pooled colonies that were re- 
covered from the spleen of the infected animal (spleen pattern). Colonies B6, A1 1 and C8 gave rise to weak hybridisation 
signals on both sets of filters. Hybridisation signals from colonies A3, C5, G3 (aroA), and F10 are present on the 
inoculum pattern but not on the spleen pattern. 

Figure 5 shows the sequence of a Salmonella gene isolated using the method of the invention and a comparison 
45 to the Escherichia coli dp protease genome. 

Figure 6 shows partial sequences of further Salmonella gene isolated using the method of the invention (SEQ ID 
Nos. 8 to 36). 

Figure 7 shows the mapping of VGC2 on the S, typhimurium chromosome. (A) DNA probes from three regions of 
VGC2 were used in Southern hybridisation analysis of lysates from a set of S. typhimurium strains harbouring locked 

50 in Mud-P22 prophages. Lysates which hybridised to a 7.5 kb Psti fragment (probe A in Figure 8) are shown. The other 
two probes used hybridised to the same lysates. (B) The insertion points and packaging directions of the phage are 
shown along with the map position in minutes (edition VIII, ref 22 in Example 4). The phage designations correspond 
to the following strains: 18RTT15242; 18Q, 15241; 19P, TT15244; 19Q, TT15243; 20 P, TT 15246 and 20Q, TT15245 
(Ref in Example 4). The locations of mapped genes are shown by horizontal bars and the approximate locations of 

55 other genes are indicated. 

Figure 8 shows a physical and genetic map of VGC2. (A) The positions of 16 transposon insertions are shown 
above the line. The extent of VGC2 is indicated by the thicker line. The position and direction of transcription of ORFs 
described in the text of Example 4 are shown by arrows below the line, together with the names of similar genes, with 



15 



EP 0 889 120 A1 



the exception of ORFs 12 and 13 whose products are similar to the sensor and regulatory components respectively, 
of a variety of two component regulatory systems. (B) The location of overlapping clones and an EcoR\/Xba\ restriction 
fragment from Mud-P22 prophage strain TT15244 are shown as filled bars. Only the portions of the X clones which 
have been mapped are shown and the clones may extend beyond these limits. 

(C) The positions of restriction sites are marked: B, SamHI; E, EcoBl; V, EcoRV; H, H/ndlll; R Pstt and X, Xba\. 
The positions of the 7.5 kb Psfi fragment (probe A) used as a probe in Figure 7 and that of the 2.2 kb Pst\/Hind\\\ 
fragment (probe B) used as a probe in Figure 10 are shown below the restriction map. The positions of Sequence 1 
(described in Figure 11) and Sequence 2 (described in Figure 12) are shown by the thin arrows (labelled Sequence 1 
and Sequence 2). 

Figure 9 describes mapping the boundaries of VGC2. (A) The positions of mapped genes at minutes 37 to 38 on 
the £ CO//K1 2 chromosome are aligned with the corresponding region of the S. typhimurium LT2 chromosome (minutes 
30 to 31 ). An expanded map of the VGC2 region is shown with 11 S. typhimurium (S. t.) DNA fragments used as probes 
(thick bars) and the restriction sites used to generate them: B, BamH\, C, C/al; H, tf/ndll; K, Kpri; R Pst\; N, Nsil and 
S, SaL Probes that hybridised to E. coliK12 (E. c.) genomic DNA are indicated by +; those which failed to hybridise 
are indicated by -. 

Figure 10 shows that VGC2 is conserved among and specific to the Salmonellae. Genomic DNA from Salmonella 
serovars and other pathogenic bacteria was restricted with Psfl (A), Hind\\\ or EcoRV (B) and subjected to Southern 
hybridisation analysis, using a 2.2 kb Psfl/H/ndlll fragment from X clone 7 as a probe (probe B Figure 2). The filters 
were hybridised and washed under stringent (A) or non-stringent (B) conditions. 

Figure 1 1 shows the DNA sequence of "Sequence 1 " of VGC2 from the centre to the left-hand end (see the arrow 
labelled Sequence 1 in Figure 2). The DNA is translated in all six reading frames and the start and stop positions of 
putative genes, and the transposon insertion positions for various mutants identified by STM are indicated (SEQ ID 
No 37). 

As is conventional a * indicates a stop codon and standard nucleotide ambiguity codes are used where necessary. 

Figure 12 shows the DNA sequence of "Sequence 2" of VGC2 (cluster C) (see the arrow labelled Sequence 2 in 
Figure 2). The DNA is translated in all six reading frames and the start and stop positions of putative genes, and the 
transposon insertion positions for various mutants identified by STM are indicated (SEQ ID No 38). 

As is conventional a * indicates a stop codon and standard nucleotide ambiguity codes are used where necessary. 

Figures 7 to 1 2 are most relevant to Example 4. 

Example 1: Identification of virulence genes in Salmonella typhimurium 
Materials and Methods 

Bacterial Strains and Plasmids 

Salmonella typhimurium strain 12023 (equivalent to American Type Culture Collection (ATCC) strain 14028) was 
obtained from the National Collection of Type Cultures (NCTC), Public Health Laboratory Service, Colindale, London, 
UK. A spontaneous nalidixic acid resistant mutant of this strain (12023 Nal r ) was selected in our laboratory. Another 
derivative of strain 12023, CL1509 (aroA::Jr\10) was a gift from Fred Heffron. Escherichia coli strains CC118 Xpir (A 
[ara-leu], araD. AlacX74 f galE, gaiK, phoA20, thi-1, rpsE, rpoB, argE(km), recA1, Xpirphage lysogen) and S17-1 Xpir 
(Tp r , Sm r , recA, thi, pro, hsdR--M+, RP4:2-Tc:Mu:KmTn7, Xpir) were giftsfrom Kenneth Timmis. E. co//DH5awas used 
for propagating pUC18 (Gibco-BRL) and Bluescript (Stratagene) plasmids containing S. typhimurium DNA. Plasmid 
pUTmini-Tn5Km2 (de Lorenzo etal, 1990) was a gift from Kenneth Timmis. 

Construction of semi-random sequence tags and ligations 

The oligonucleotide pool RT1(5'-CTAGGTACCTACAACCTCAAGCITT-[NK] 20 -AAGCTTGGTTAGAATGGGTAC- 
CATG-3') (SEQ ID No 1), and primers P2 (5'-TACCTACAACCTCAAGCT-3*) (SEQ ID No 2), P3 (5'-CATGGTACCCAT- 
TCTAAC-3') (SEQ ID No 3), P4 (5'-TACCCATTCTAACCAAGC-3') (SEQ ID No 4) and P5 (5'-CTAGGTACCTACAAC- 
CTC-3') (SEQ ID No 5) were synthesized on a oligonucleotide synthesizer (Applied Biosystems, model 380B). Double 
stranded DNA tags were prepared from RT1 in a 100 uJ volume PCR containing 1.5 mM MgCI 2 , 50 mM KCI, and 10 
mM Tris-CI (pH 8.0) with 200 pg of RT1 as target; 250 uM each dATP, dCTP, dGTR dTTP; 100 pM of primers P3 and 
P5; and 2.5 U of Amplitaq (Perkin-Elmer Cetus). Thermal cycling conditions were 30 cycles of 95°C for 30 s, 50°C for 
45 s, and 72°C for 10 s. The PCR product was gel purified (Sambrook et at, 1989), passed through an elutipD column 
(available from Schleicher and Schull) and digested with Kpri prior to ligation into pUC18 or pUTmini-Tn5Km2. For 
ligations, plasmids were digested with Kpri and dephosphorylated with calf intestinal alkaline phosphatase (Gibco- 
BRL). Linearized plasmid molecules were gel-purified (Sambrook et al, 1989) prior to ligation to remove any residual 
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uncut plasmid DNA from the digestion. Ligation reactions contained approximately 50 ng each of plasmid and double 
stranded tag DNA in a 25 uJ volume with 1 unit T4 DNA ligase (Gibco-BRL) in a buffer supplied with the enzyme. 
Ligations were carried out for 2 h at 24°C. To determine the proportion of bacterial colonies arising from either self 
ligation of the plasmid DNA or uncut plasmid DNA, a control reaction was carried out in which the double stranded tag 
5 DNA was omitted from the ligation reaction. This yielded no ampicillin resistant bacterial colonies following transfor- 
mation of £ coli CC1 18 (Sambrook etal, 1989), compared with 185 colonies arising from a ligation reaction containing 
the double stranded tag DNA. 

Bacterial Transformation and Matings 

10 

The products of several ligations between pUT mini-Tn5Km2 and the double stranded tag DNA were used to 
transform £ co//CC1 18 (Sambrook etal, 1989). A total of approximately 1 0, 300 transformants were pooled and plasmid 
DNA extracted from the pool was used to transform £ . coliS-W Xpir{6e Lorenzo & Timmis, 1994). For mating exper- 
iments, a pool of approximately 40,000 ampicillin resistant £ coli S-1 7 Xpir transformants, and S. typhimurium 12023 

75 Nal r were cultured separately to an optical density (OD)^ of 1 .0. Aliquots of each culture (0.4 ml) were mixed in 5 ml 
10 mM MgS0 4 and filtered through a Millipore membrane (0.45 u.m diameter). The filters were placed on the surface 
of agar containing M9 salts (de Lorenzo & Timmis, 1994) and incubated at 37°C for 16 h. The bacteria were recovered 
by shaking the filters in liquid LB medium for 40 min at 37°C and exconjugants were selected by plating the suspension 
onto LB medium containing 100 u,g ml -1 nalidixic acid (to select against the donor strain) and 50 u.g ml' 1 kanamycin (to 

20 select for the recipient strain). Each exconjugant was checked by transferring nalidixic acid resistant (naF), kanamycin 
resistant (kanQ colonies to MacConkey Lactose indicator medium (to distinguish between £ coli and S. typhimurium), 
and to LB medium containing ampicillin. Approximately 90% of the naK, kan r colonies were sensitive to ampicillin, 
indicating that these resulted from authentic transposition events (de Lorenzo & Timmis, 1994). Individual ampicillin- 
sensitive exconjugants were stored in 96 well microtitre dishes containing LB medium. For long term storage at 80°C, 

25 either 7% DMSO or 1 5% glycerol was included in the medium. 

Phenotypic characterisation of mutants 

Mutants were replica plated from microtitre dishes onto solid medium containing M9 salts and 0.4% glucose (Sam- 
30 brook et al, 1 989) to identify auxotrophs. Mutants with rough colony morphology were detected by low magnification 
microscopy of colonies on agar plates. 

Colony Blots, DNA extractions, PCRs, DNA labelings and hybridisations 

35 For colony blot hybridizations, a 48-well metal replicator (Sigma) was used to transfer exconjugants from microtitre 

dishes to Hybond N nylon filters (Amersham, UK) that had been placed on the surface of LB agar containing 50 u.g 
ml -1 kanamycin. After overnight incubation at 37°C, the filters supporting the bacterial colonies were removed and 
dried at room temperature for 10 min. The bacteria were lysed with 0.4 N NaOH and the filters washed with 0.5 N Tris- 
Cl pH 7.0 according to the filter manufacturer's instructions. The bacterial DNA was fixed to the filters by exposure to 

40 u V light from a Stratalinker (Stratagene). Hybridisations to 32 P-labelled probes were carried out under stringent con- 
ditions as previously described (Holden etal, 1989). For DNA extractions, S. typhimurium transposon mutant strains 
were grown in liquid LB medium in microtitre dishes or resuspended in LB medium following growth on solid media. 
Total DNA was prepared by the hexadecyltrimethylammoniumbromide (CTAB) method according to Ausubel et al 
(1987). Briefly, cells from 150 to 1000 jil volumes were precipitated by centrifugation and resuspended in 576 uJ TE. 

45 To this was added 15 \i\ of 20% SDS and 3 u.l of 20 mg ml" 1 proteinase K. After incubating at 37°C for 1 hour, 166 u.l 
of 3 M NaCI was added and mixed thoroughly, followed by 80 ul of 10% (w/v) CTAB and 0.7 M NaCI. After thorough 
mixing, the solution was incubated at 65°C for 10 min. Following extraction with phenol and phenol-chloroform, the 
DNA was precipitated by addition of isopropanol, washed with 70% ethanol and resuspended in TE at a concentration 
of approximately 1 jag ul" 1 . 

so The DNA samples were subjected to two rounds of PCR to generate labelled probes. The first PCR was performed 

in 100 uJ reactions containing 20 mM Tris-Ci pH 8.3; 50 mM KCI; 2 mM MgCI 2 ; 0.01% Tween 80; 200 uM each dATP, 
dCTP, dGTP, dTTP; 2.5 units of Amplitaq polymerase (Perkin-Elmer Cetus); 770 ng each primer P2 and P4; and 5 \ig 
target DNA. After an initial denaturation of 4 min at 95°C, thermal cycling consisted of 20 cycles of 45 s at 50°C, 10 s 
at 72°C, and 30 s at 95°C. PCR products were extracted with chloroform/isoamyl alcohol (24/1 ) and precipitated with 

55 ethanol. DNA was resuspended in 10 uJ TE and the PCR products were purified by electrophoresis through a 1.6% 
•Seaplaque (FMC Bioproducts) gel in TAE buffer. Gel slices containing fragments of about 80 bp were excised and 
used for the second PCR. This reaction was carried out in a 20 ul total volume, and contained 20 mM Tris-CI pH 8.3; 
50 mM KCI; 2 mM MgCI 2 ; 0.01% Tween 80; 50 jiM each dATP, dTTP, dGTP; 10 uJ 32 P-dCTP (3000 Ci/mmol, Amer- 
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sham); 150 ng each primer P2 and P4; approximately 10 ngof target DNA (1-2ul of 1 .6% Seaplaque agarose containing 
the first round PGR product); 0.5 units of Amplitaq polymerase. The reaction was overlayed with 20 uJ mineral oil and 
thermal cycling was performed as described above. Incorporation of the radioactive label was quantitated by absorb- 
ance to Whatman DE81 paper (Sambrook et al, 1 989). 

Infection Studies 

Individual Salmonella exconjugants containing tagged transposons were grown in 2% tryptone, 1% yeast extract, 
0.92% v/v glycerol, 0.5% Na 2 P0 4 , 1% KN0 3 (TYGPN medium) (Ausubel et al, 1987) in microtitre plates overnight at 
37°C. A metal replicator was used to transfer a small volume of the overnight cultures to a fresh microtitre plate and 
the cultures were incubated at 37°C until the OD^ (measured using a Titertek Multiscan microtitre plate reader) was 
approximately 0.2 in each well. Cultures from individual wells were then pooled and the OD^ determined using a 
spectrophotometer. The culture was diluted in sterile saline to approximately 5x10 s cfu ml" 1 . Further dilutions were 
plated out onto TYGPN containing nalidixic acid (100 mg mM) and kanamycin (50 mg mM) to confirm the cfu present 
in the inoculum. 

Groups of three female BALB/c mice (20-25g) were injected intraperitoneal ly with 0.2 ml of bacterial suspension 
containing approximately 1x10 5 cfu ml" 1 . Mice were sacrificed three days post-inoculation and their spleens were re- 
moved to recover bacteria. Half of each spleen was homogenized in 1 ml of sterile saline in a microf uge tube. Cellular 
debris was allowed to settle and 1 ml of saline containing cells still in suspension was removed to a fresh tube and 
centrifuged for two minutes in a microfuge. The supernatant was aspirated and the pellet resuspended in 1 ml of sterile 
distilled water. A dilution series was made in sterile distilled water and 100 ml of each dilution was plated onto TYGPN 
agar containing nalidixic acid (100 ug ml" 1 ) and kanamycin (50 ug mM), Bacteria were recovered from plates containing 
between 1000 and 4000 colonies, and a total of over 10,000 colonies recovered from each spleen were pooled and 
used to prepare DNA for PCR generation of probes to screen colony blots. 

Virulence gene cloning and DNA sequencing 

Total DNA was isolated from S. typhimurium exconjugants and digested separately with Ssfl, Sail, Pst\ and Sphl. 
Digests were fractionated through agarose gels, transferred to Hybond N+ membranes (Amersham) and subjected to 
Southern hybridisation analysis using the kanamycin resistance gene of pUT mini-Tn5Km2 as a probe. The probe was- 
labelled with digoxygenin (Boehringer-Mannheim) and chemiluminescence detection was carried out according to the 
manufacturer's instructions. The hybridisation and washing conditions were as described above. Restriction enzymes 
which gave rise to hybridising fragments in the 3-5 kb range were used to digest DNA for a preparative agarose gel, 
and DNA fragments corresponding to the sizes of the hybridisation signals were excised from this, purified and ligated 
intopUC18. Ligation reactions were used to transform E. co//DH5a to kanamycin resistance. Plasmidsfrom kanamycin- 
resistant transformants were purified by passage through an elutipD column and checked by restriction enzyme diges- 
tion. Plasmid inserts were partially sequenced by the di-deoxy method (Sanger et al, 1977) using the -40 primer and 
reverse sequencing primer (United States Biochemical Corporation) and the primers P6 (5'-CCTAGGCGGCCAGATCT- 
GAT-3') (SEQ ID No 6) and P7 (5'GCACTTGTGTATAAGAGTCAG-3') (SEQ ID No 7) which anneal to the I and O termini 
of Tn5, respectively. Nucleotide sequences and deduced amino acid sequences were assembled using the Macvector 
3.5 software package run on a Macintosh SE/30 computer. Sequences were compared with the EMBL and Genbank 
DNA databases using the UNIX/SUN computer system at the Human Genome Mapping Project Resource Centre 
Harrow, UK. 

Results 

Tag Design 

The structure of the DNA tags is shown in Figure la. Each tag consists of a variable central region flanked by 
"arms", of invariant sequence. The central region sequence ([NKk 0) was designed to prevent the occurrence of sites 
for the commonly used 6 bp-recognition restriction enzymes, but is sufficiently variable to ensure that statistically, the 
same sequence should only occur once in 2 x 10" molecules (DNA sequencing of 1 2 randomly selected tags showed 
that none shared more than 50% identity over the variable region). (N means any base (A, G, C or T) and K means G 
or T.) The arms contain Kpn\ sites close to the ends to facilitate the initial cloning step, and the H/ndlll sites bordering 
the variable region were used to release radiolabelled variable regions from the arms prior to hybridisation analysis. 
The arms were also designed such that primers P2 and P4 each contain only one guanine residue. Therefore during 
a PCR using these primers, only one cytosine will be incorporated into each newly synthesised arm, compared to an 
average of ten in the unique sequence. When radiolabelled dCTP is included in the PCR, an average of ten-fold more 
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label will be present in the unique sequence compared with each arm. This is intended to minimise background hy- 
bridisation signals from the arms, after they have been released from the unique sequences by digestion with H/ndlll. 
Double stranded tags were ligated into the Kpri site of the mini-Tn5 transposon Km2, carried on plasmid pUT (de 
Lorenzo & Timmis, 1994). Replication of this plasmid is dependent on the R6K-specified n product of the pir gene. It 
carries the or/Tsequence of the RP4 plasmid, permitting transfer to a variety of bacterial species (Miller & Mekalanos, 
1988). and the tnp* gene needed for transposition of the mini-Tn5 element. The tagged mini-Tn5 transposons were 
transferred to S. typhimurium by conjugation, and 288 exconjugants resulting from transposition events were stored 
in the wells of microtitre dishes. Total DNA isolated from 12 of these was digested with EcoRV, and subjected to 
Southern hybridisation analysis using the kanamycin resistance gene of the mini-Tn 5 transposon as a probe. In each 
case, the exconjugant had arisen as a result a single integration of the transposon into a different site of the bacterial 
genome (Figure 2). 

Specificity and sensitivity studies 

We next determined the efficiency and uniformity of amplification of the DNA tags in PCRs involving pools of 
exconjugant DNAs as targets for the reactions. In an attempt to minimise unequal amplification of tags in the PCR, we 
determined the maximum quantity of DNA target that could be used in a 100 ul reaction, and the minimum number of 
PCR cycles, that resulted in products which could be visualised by ethidium bromide staining of an agarose gel (5 u.g 
DNA and 20 cycles, respectively). 

S. typhimurium exconjugants which had reached stationary growth phase in microtitre dishes were combined, and 
used to extract DNA. This was subjected to a PCR using primers P2 and P4. PCR products of 80 bp were gel-purified 
and used as targets for a second PCR, using the same primers but with 32 P-labelled CTP This resulted in over 60% 
of the radiolabeled dCTP being incorporated into the PCR products. The radiolabelled products were digested with 
H/ndlll and used to probe colony blotted DNA from their corresponding microtitre dishes. Of the 1510 mutants tested 
in this way, 358 failed to yield a clear signal on an autoradiogram following an overnight exposure of the colony blot. 
There are three potential explanations for this. Firstly, it is possible that a proportion of the transposons did not carry 
tags. However, by comparing the transformation frequencies resulting from ligation reactions involving the transposon 
in the presence or absence of tags, it seems unlikely that untagged transposons could account for more than approx- 
imately 0.5% of the total (see Materials and Methods). More probable causes are that the variable sequence was 
truncated in some of the tags, and/or that some of the sequences formed secondary structures, both of which might 
have prevented amplification. Mutants which failed to give clear signals were not included in further studies. The spe- 
cificity of the efficiently amplifiable tags was demonstrated by generating a probe from 24 colonies of a microtitre dish, 
and using it to probe a colony blot of 48 colonies, which included the 24 used to generate the probe. The lack of any 
hybridisation signal from the 24 colonies not used to generate the probe (Figure 3) shows that the hybridisation con- 
ditions employed were sufficiently stringent to prevent cross-hybridisation among labelled tags, and suggests that each 
exconjugant is not reiterated within a microtitre dish. 

There are further considerations in determining the maximum pool size that can be used as an inoculum in animal 
experiments. As the quantity of labelled tag for each transposon is inversely proportional to the complexity of the tag 
pool, there is a limit to the pool size above which hybridisation signals become too weak to be detected after overnight 
exposure of an autoradiogram. More importantly, as the complexity of the pool increases, so must the likelihood of 
failure of a virulent representative of the pool to be present in sufficient numbers, in the spleen of an infected animal, 
to produce enough labelled probe. We have not determined the upper limit for pool size in the murine model of salmo- 
nellosis that we have employed, but it must be in excess of 96. 

Virulence tests of the transposon mutants 

A total of 1152 uniquely tagged insertion mutants (from two microtitre dishes) were tested for virulence in BALB/ 
c mice in twelve pools, each representing a 96-well microtitre dish. Animals received an intraperitoneal injection of 
approximately 1 0 3 cells of each of 96 transposon mutants of a microtitre dish (1 0 5 organisms in total). Three days after 
injection mice were sacrificed, and bacteria were recovered by plating spleen homogenates onto laboratory medium. 
Approximately 10,000 colonies recovered from each mouse were pooled and DNA was extracted. The tags present 
in this DNA sample were amplified and labelled by the PCR, and colony blots probed and compared with the hybridi- 
sation pattern obtained using tags amplified from the inoculum (Figure 3). As a control, an aroA mutant of S. typhimurium 
was tagged and employed as one of the 96 mutants in the inoculum. This strain would not be expected to be recovered 
in the spleen because its virulence is severely attenuated (Buchmeier era/, 1993). Fortyone mutants were identified 
whose DNA hybridized to labelled tags from the inoculum but not from labelled tags from bacteria recovered from the 
spleen. The experiment was repeated and the same forty-one mutants were again identified. Two of these were the 
aroA mutant (one per pool), as expected. Another was an auxotrophic mutant (it failed to grow on minimal medium). 
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All of the mutants had normal colony morphology. 

Example 2: Cloning and partial characterisation of sequences flanking the transposon 

DN A was extracted from one of the mutants described in Example 1 (Pool 1 , F1 0), digested with Sst\, and subcloned 
on the basis of kanamycin resistance. The sequence of 450 bp flanking one end of the transposon was determined 
using primer 97. This sequence shows 80% identity to the £ co//clp (Ion) gene, which encodes a. heat-regulated 
protease (Figure 5). To our knowledge, this gene has not previously been implicated as a virulence determinant. 

Partial sequences of thirteen further Salmonella typhimurium virulence genes are shown in Figure 6 (sequences 
A2 to A9 and B1 to B5). Deduced amino acid sequences of P2D6, S4C3, P3F4, P7G2 and P9B7 bear similarities to 
a family of secretion-associated proteins that have been conserved throughout bacterial pathogens of animals and 
plants, and which are known in Salmonella as the /nvfamily. In S. typhimurium the inv genes are required for bacterial 
invasion into intestinal tissue. The virulence of inv mutants is attenuated when they are inoculated by the oral route, 
but not when they are administered intraperitoneal^. The discovery of /nv-related genes that are required for virulence 
following intraperitoneal inoculation suggests a new secretion apparatus which might be required for invasion of non- 
phagocytic cells of the spleen and other organs. The products of these new genes might represent better drug targets 
than the inv proteins in the treatment of established infections. 

Further characterisation of the genes identified in this example is described in Example 4. 

Example 3: LD c n determinations and mouse vaccination study 

Mutations identified by the method of the invention attenuate virulence. 

Five of the mutations in genes not previously implicated in virulence were transferred by P22-mediated transduction 
to the nalidixic acid-sensitive parent strain of S. typhimurium 1 2028. Transductants were checked by restriction mapping 
then injected by the intraperitoneal route into groups of BALB/c mice to determine their 50% lethal dose (LD 50 ). The 
LDgo values for mutants S4C3, P7G2, P3F4 and P9B7 were all several orders of magnitude higher than that of the 
wild-type strain. No difference in the LD 50 was detected for mutant P1 F10; however, there was a statistically significant 
decrease in the proportion of P1 F10 cells recovered from the spleens of mice injected with an inoculum consisting of 
an equal proportion of this strain and the wild-type strain. This implies that this mutation does attenuate virulence, but 
to a degree that is not detectable by LD 50 . 

Mutants P3F4 and P9B7 were also administered by the oral route at an inoculum level of 10 7 cells/mouse. None 
of the mice became ill, indicating that the oral LD 50 levels of these mutants are at least an order of magnitude higher 
than that of the wild-type strain. 

In the mouse vaccination study groups of five female BALB/c mice of 20-25 g in mass were initially inoculated 
orally (p.o.) or intraperitoneal^ (i.p.) with serial ten fold dilutions of Salmonella typhimurium mutant strains P3F4 and 
P9B7. After four weeks the mice were then inoculated with 500 c.f.u. of the parental wild type strain. Deaths were then 
recorded over four weeks. 

A group of two mice of the same age and batch as the mice inoculated with the mutant strains were also inoculated 
i.p. with 500 c.f.u. of the wild type strain as a positive control. Both non-immunised mice died as expected within four 
weeks. 

Results are tabulated below: 



1) p.o. initial inoculation with mutant strain P3F4 



initial inoculum in c.f.u. 


no. mice surviving first challenge 


no. mice surviving wild type challenge 


5x 10 9 


5 


2 (40%) 


5x10 8 


5 


2 (40%) 


5x 10 7 


5 


0 (0%) 


2) i.p. initial inoculum with r 


nutant strain P3F4 


II initial inoculum in c.f.u. 


no. mice surviving first challenge 


no. mice surviving wild type challenge 


J 5x10 6 


3 


3 (100%) 


5x10 5 


5 


4 (80%) 
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(continued) 



initial inoculum in c.f.u. 


no. mice surviving first challenge 


no. mice surviving wild type challenge 


5x 10 4 


6 


5 (83%) 


5X10 3 


5 


4 (80%) 


3) p.o. initial inoculum with mutant strain P9B7 


initial inoculum in c.f.u. 


no. mice surviving first challenge 


no. mice surviving wild type challenge 


5x 10 9 


5 


0 (0%) 


4) i.p. initial inoculum with mutant P9B7 


initial inoculum in c.f.u. 


no. mice surviving first challenge 


no. mice surviving wild type challenge 


5x 10 6 

, 


4 


2 (50%) 



From these experiments I conclude that mutant P3P4 appears to give some protection against subsequent wild 
type challenge. This protection appears greater in mice that were immunised i.p. 



Example 4: Identification of a virulence locus encoding a second type Mj secretion system in Salmonella 
typhimurium 

25 

Abbreviations used in this Example are VGC1, virulence gene cluster 1; VGC2, virulence gene cluster 2. 



Background to the experiments described 

Salmonella typhimurium is a principal agent of gastroenteritis in humans and produces a systemic illness in mice 
which serves as a model for human typhoid fever (1). Following oral inoculation of mice with S. typhimurium, the bacteria 
pass from the lumen of the small intestine through the intestinal mucosa, via enterocytes or M cells of the Peyer's patch 
follicles (2). The bacteria then invade macrophages and neutrophils, enter the reticuloendothelial system and dissem- 
inate to other organs, including the spleen and liver, where further reproduction results in an overwhelming and fatal 
bacteremia (3). To invade host cells, to survive and replicate in a variety of physiologically stressful intracellular and 
extracellular environments and to circumvent the specific antibacterial activities of the immune system, S. typhimurium 
employs a sophisticated repertoire of virulence factors (4). 

To gain a more comprehensive understanding of virulence mechanisms of S. typhimurium and other pathogens 
the transposon mutagenesis system described in Example 1, which is conveniently called 'signature-tagged mutagen- 
esis' (STM), which combines the strength of mutational analysis with the ability to follow simultaneously the fate of a 
large number of different mutants within a single animal (5 and Example 1 ; Reference 5 was published after the priority 
date for this invention). Using this approach we identified 43 mutants with attenuated virulence from a total of 1152 
mutants that were screened. The nucleotide sequences of DNA flanking the insertion points of transposons in 5 of 
these mutants showed that they were related to genes encoding type III secretion systems ot a variety of bacterial 
pathogens (6, 7). The products of the inv/spa gene cluster of S. typhimurium (8, 9) are proteins that form a type III 
secretion system required for the assembly of surface appendages mediating entry into epithelial cells (10). Hence the 
virulence of strains carrying mutations in the inv/spa cluster is attenuated only if the inoculum is administered orally 
and not when given intraperitoneal^ (8). In contrast the 5 mutants identified by STM are avirulent following intraperi- 
toneal inoculation (5). 

In this example we show that the transposon insertion points of these 5 mutants and an additional 11 mutants 
identified by STM all map to the same region of the S. typhimurium chromosome. Further analysis of this region reveals 
additional genes whose deduced products have sequence similarity to other components of type III secretion systems. 
This chromosomal region which we refer to as virulence gene cluster 2 (VGC2) is not present in a number of other 
enteric bacteria, and represents an important locus for S. typhimurium virulence. 



21 



EP 0 889 120 A1 

Materials and Methods 

Bacterial Strains, Transduction and Growth Media. Salmonella enterica serotypes 5791 (aberdeen), 423180 
(gallinarvm), 7101 (cubana)av\6 1 2416 (typhimurium LT2) were obtained from the National Collections of Type Cultures, 

5 Public Health Laboratory Service, UK. Salmonella typhiBRDI 23 genomic DNA was a gift from G. Dougan, enteropath- 
ogenic Escherichia coli (EPEC), enterohemorrhagic £ coli (EH EC), Vibrio cholera biotype El Tor, Shigella flexneri 
serotype2 and Staphylococcus aureus were clinical isolates obtained from the Department of Infectious Diseases and 
Bacteriology, Royal Postgraduate Medical School, UK. Genomic DNA from Yersinia pestis was a gift from J. Heese- 
mann. However, genomic DNA can be isolated using standard methods. The bacterial strains and the methods used 

10 to generate signature-tagged mini-Tn5 transposon mutants of S. typhimurium NCTC strain 1 2023 have been described 
previously (5, 11). Routine propagation of plasmids was In E. coli DH5a. Bacteria were grown in LB broth (12) supple- 
mented with the appropriate antibiotics. Before virulence levels of individual mutant strains were assessed, the muta- 
tions were first transferred by phage P22 mediated transduction (12) to the nalidixic acid sensitive parental strain of 
S. typhimurium 12023. Transductants were analysed by restriction digestion and Southern hybridisation before use as 

1$ inoculum. 

Lambda Library Screening. Lambda (X) clones with overlapping insert DNAs covering VGC2 were obtained by 
standard methods (13) from a VI 059 library (14) containing inserts from a partial Sau3A digest of S. typhimurium LT2 
genomic DNA. The library was obtained via K. Sanderson, from the Salmonella Genetic Stock Centre (SGSC), Calgary, 
Canada. 

20 Mud-P22 Lysogens. Radiolabeled DNA probes were hybridised to Hybond N (Amersham) filters bearing DNA 

prepared from lysates of a set of S. typhimurium strains harbouring Mud-P22 prophages at known positions in the S. 
typhimurium genome. Preparation of mitomycin-induced Mud-P22 lysates was as described (12. 15). The set of Mud- 
P22 prophages was originally assembled by Benson and Goldman (16) and was obtained from the SGSC. 

Gel Electrophoresis and Southern Hybridisation. Gel electrophoresis was performed in 1 % or 0.6% agarose 

25 gels run in 0.5 x TBE. Gel fractionated DNA was transferred to Hybond N or N+ membranes (Amersham) and stringent 
hybridisation and washing procedures (permitting hybridisation between nucleotide sequences with 10% or less mis- 
matches) were as described by Holden et al, (17). For non-stnngent conditions (permitting hybridisation between se- 
quences with 50% mismatches) filters were hybridised overnight at 42°C in 10% formamide/0.25 M Na 2 HP0 4 /7% SDS 
and the most stringent step was with 20 mM Na 2 HP0 4 /1% SDS at 42°C. DNA fragments used as probes were labelled 

30 with [ 32 P]dCTP using the 'Radprime' system (Gibco-BRL) or with [digoxigenin-1 1 JdUTP and detected using the Digox- 
igenin system (Boehringer Mannheim) according to the manufacturers' instructions, except that hybridisation was per- 
formed in the same solution as that used for radioactively labelled probes. Genomic DNA was prepared for Southern 
hybridisation as described previously (13). 

Molecular Cloning and Nucleotide Sequencing. Restriction endonucleases and T4 DNA ligase were obtained 

35 from Gibco-BRL. General molecular biology techniques were as described in Sambrook et al, (18). Nucleotide se- 
quencing was performed by the dideoxy chain termination method (19) using a T7 sequencing kit (Pharmacia). Se- 
quences were assembled with the MacVector 3.5 software or AssemblyLIGN packages. Nucleotide and derived amino 
acid sequences were compared with those in the European Molecular Biology Laboratory (EMBL) and SwissProt da- 
tabases using the BLAST and FASTA programs of the GCG package from the University of Wisconsin (version 8) (20) 

40 on the network service at the Human Genome Mapping Project Resource Centre, Hinxton, UK. 

Virulence Tests. Groups of five female BALB/c mice (20-25g) were inoculated orally (p.o.) or intraperitoneally (i. 
p.) with 10-fold dilutions of bacteria suspended in physiological saline. For preparation of the inoculum, bacteria were 
grown overnight at 37°C in LB broth with shaking (50 rpm) and then used to inoculate fresh medium for various lengths 
of time until an optical density (OD) at 560 nm of 0.4 to 0.6 had been reached. For cell densities of 5 x 10 s colony 

45 forming units (cfu) per ml and above, cultures were concentrated by centrifugation and resuspended in saline The 
concentration of cfu/ml was checked by plating a dilution series of the inoculum onto LB agar plates. Mice were inoc- 
ulated i.p. with 0.2 ml volumes and p.o. by gavage with the same volume of inoculum. The LD 50 values were calculated 
after 28 days by the method of Reed and Meunch (21). 

so Results 

Localisation of Transposon Insertions. The generation of a bank of Salmonella typhimurium mini Tn5 transpo- 
son mutants and the screen used to identify 43 mutants with attenuated virulence have been described previously (5). 
Transposons and flanking DNA regions were cloned from exconjugants by selection for kanamycin resistance or by 
55 inverse PCR. Nucleotide sequences of 300-600 bp of DNA flanking the transposons were obtained for 33 mutants. 
Comparison of these sequences with those in the DNA and protein databases indicated that 14 mutants resulted from 
transposon insertions into previously known virulence genes, 7 arose from insertions into new genes with similarity to 
known genes of the enterobacteria and 12 resulted from insertions into sequences without similarity to entries in the 
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DNA and protein databases (ret. 5, Example 1 and this Example). 

Three lines of evidence suggested that 16 of 19 transposon insertions into new sequences were clustered in three 
regions of the genome, initially designated A, B and C. First, comparing nucleotide sequences from regions flanking 
transposon insertion points with each other and with those in the databases showed that some sequences overlapped 
s with one another or had strong similarity to different regions of the same gene. Second, Southern analysis of genomic 
DNA digested with several restriction enzymes and probed with restriction fragments flanking transposon insertion 
points indicated that some transposon insertions were located on the same restriction fragments. Third, when the same 
DNA probes were hybridised to plaques from a S. typhimurium X DNA library, the probes from mutants which the 
previous two steps had suggested might be linked were found to hybridise to the same X DNA clones. Thus two mutants 

10 (P9B7and P12F5) were assigned to cluster A, five mutants (P2D6, P9B6, P11C3, P11D10and P11H10) to cluster B 
and nine mutants (P3F4, P4F8, P7A3, P7B8, P7G2, P8G12, P9G4, P10E11 and P11 B9) to cluster C (Figure 8). 

Hybridisation of DNA probes from these three clusters to lysates from a set of S. typhimurium strains harbouring 
locked-in Muc*P22 prophages (1 5, 1 6) showed that the three loci were all located in the minute 301o 31 region (edition 
VIII. ref. 22) (Figure 7), indicating that the three loci were closely linked or constituted one large virulence locus. To 

is determine if any of the X clones covering clusters A, B and C contained overlapping DNA inserts, DNA fragments from 
the terminal regions of each clone were used as probes in Southern hybridisation analysis of the other X clones. Hy- 
bridising DNA fragments showed that several X clones overlap and that clusters A, B and C comprise one contiguous 
region (Figure 8). DNA fragments from the ends of this region were then used to probe the X library to identify further 
clones containing inserts representing the adjacent regions. No X clones were identified that covered the extreme right 

20 hand terminus of the locus so this region was obtained by cloning a 6.5 kb ECOP\/Xba\ fragment from a lysate of the 
Muof-P22 prophage strain TT15244 (16). 

Restriction mapping and Southern hybridisation analysis were then used to construct a physical map of this locus 
(Figure 8). To distinguish this locus from the well characterised inv/spa gene cluster at minute 63 (edition VIII. ref. 22) 
(8, 9, 23, 24, 25, 26), we refer to the latter as virulence gene cluster 1 (VGC1) and have termed the new virulence 

25 locus VGC2. Figure 2 shows the position of two portions of DNA whose nucleotide sequence has been determined 
("Sequence 1" and "Sequence 2"). The nucleotide sequence is shown in Figures 11 and 12. 

Mapping the boundaries of VGC2 on the S. typhimurium chromosome. Nucleotide sequencing of X clone 7 
at the left hand side of VGC2 revealed the presence of an open reading frame (ORF) whose deduced amino acid 
sequence is over 90% identical to the derived product of a segment of the ydhE* gene of £ coli and sequencing of 

30 the 6.5 kb EcoR\/Xba\ cloned fragment on the right hand side of VGC2 revealed the presence of an ORF whose 
predicted amino acid sequence is over 90% identical to pyruvate kinase I of E. coli encoded by the pykF gene (27). 
On the E. coli chromosome ydhE and pykFare located close to one another, at minute 37 to 38 (28). Eleven non- 
overlapping DNA fragments distributed along the length of VGC2 were used as probes in non-stringent Southern 
hybridisation analysis of E. coli and S. typhimurium genomic DNA. Hybridising DNA fragments showed that a region 

35 of approximately 40 kb comprising VGC2 was absent from the E. coli genome and localised the boundaries of VGC2 
to within 1 kb (Figure 9). Comparison of the location of the Xba\ site close to the right hand end of VGC2 (Figure 8) 
with a map of known Xba\ sites (29) at the minute 30 region of the chromosome (22) enables a map position of 30.7 
minutes to be deduced for VGC2. 

Structure of VGC2. Nucleotide sequencing of portions of VGC2 has revealed the presence of 19 ORFs (Figure 

40 8). The G+C content of approximately 26 kb of nucleotide sequence within VGC2 is 44.6%, compared to 47% for VGC1 
(9) and 51-53% estimated for the entire Salmonella genome (30). 

The complete deduced amino acid sequences of ORFs 1-11 are similar to those of proteins of type III secretion 
systems (6,7), which are known to be required for the export of virulence determinants in a variety of bacterial pathogens 
of plants and animals (7). The predicted proteins of ORFs 1 - 8 (Figure 8) are similar in organisation and sequence to 

45 the products of the yscN-U genes of Yersinia pseudotuberculosis (31), to invC/spaS of the inv/spa cluster in VGC1 of 
Salmonella typhimurium (8, 9) and to spa47/spa40 of the spa/mxi cluster of Shigella flexneri {32, 33, 34, 35,). For 
example the predicted amino acid sequence of ORF 3 (Figure 8) is 50% identical to YscS of Y pseudotuberculosis 
(31 ), 34% identical to Spa9 from S. flexneri{35) and 37 % identical to SpaQof VGC of S. typhimurium (9). The predicted 
protein product of ORF9 is closely related to the LcrD family of proteins with 43% identity to LcrD of Y. enterocolitica 

so (36), 39% identity to MxiA of S. flexneri (32) and 40% identity to InvA of VGC1 (23). Partial nucleotide sequences for 
the remaining ORFs shown in Figure 8 indicate that the predicted protein from ORF 10 is most similar to Y. enterocolitica 
YscJ (37) a lipoprotein located in the bacterial outer membrane, with ORF11 similar to S. typhimurium InvG, a member 
of the PulD family of translocases (38). ORF12 and ORF13 show significant similarity to the sensor and regulatory 
subunits respectively, from a variety of proteins comprising two component regulatory systems (39). There is ample 

55 coding capacity for further genes between ORFs 9 and 1 0. ORFs 10 and 11 , and between ORF 1 9 and the right hand 
end of VGC2. 

VGC2 is conserved among and is specific to the Salmonellae. A 2.2 kb Pstl/HirtiM fragment located at the 
centre of VGC2 (probe B, Figure 8) lacking sequence similarity to entries in the DNA and protein databases was used 
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as a probe in Southern hybridisation analysis of genomic DNA from Salmonella serovars and other pathogenic bacteria 
(Figure 1 0A). DNA fragments hybridising under non-stringent conditions showed that VGC2 is present in S. aberdeen, 
S. gallinarum, S. cubana, S. typhi and is absent from EPEC, EHEC, Y. pestis, S. flexneri, V. cholera and S. aureus. 
Thus VGC2 is conserved among and is likely to be specific to the Salmonellae. 

5 To determine if the organisation of the locus is conserved among the Sa/mone//a serovars tested, stringent Southern 

hybridisations with genomic DNA digested with two further restriction enzymes were carried out. Hybridising DNA 
fragments showed that there is some heterogeneity in the arrangement of restriction sites between S. typhimurium 
LT2and S. gallinarum, S. cubana and S. typhi {Figure 10B). Furthermore, S. gallinarum and S. typhi contain additional 
hybridising fragments to those present in the other Salmonellae examined, suggesting that regions of VGC2 have been 

70 duplicated in these species. 

VGC2 is required for virulence in mice. Previous experiments showed that the LD^ values for i.p. inoculation 
of transposon mutants P3F4, P7G2, P9B7 and P11C3 were at least 100-fold greater than the wild type strain (5). In 
order to clarify the importance of VGC2 in the process of infection, the p.o. and i.p. LD 50 values for mutants P3F4 and 
P9B7 were determined (Table 1). Both mutants showed a reduction in virulence of at least five orders of magnitude by 
either route of inoculation in comparison with the parental strain. This profound attenuation of virulence by both routes 
of inoculation demonstrates that VGC2 is required for events in the infective process after epithelial cell penetration in 
BALB/c mice. 



Table 1. 



LD 50 values of S. typhimurium strains. 




LD 50 


(cfu) 


Strain 


i.p. 


p.o. 


12023 wild type 


4.2 


6.2 x 10 4 


P3F4 


1.5 x10 s 


>5x 10 9 


P9B7 


>1.5x 10 6 


>5x 10 9 


cfu. colony forming units 



30 

Discussion 

A hitherto unknown virulence locus in S. typhimurium of approximately 40 kb located at minute 30.7 on the chro- 
mosome by mapping the insertion points of a group of signature-tagged transposon mutants with attenuated virulence 

35 has been identified (5). This locus is referred to as virulence gene cluster 2 (VGC2) to distinguish it from the inv/spa 
virulence genes at 63 minutes (edition VI II , ref. 22) which we suggest be renamed VGC 1 . VGC 1 and VGC2 both encode 
components of type III secretion systems. However, these secretion systems are functionally distinct. 

Of 19 mutants that arose from insertions into new genes (ref. 5 and this example) 16 mapped to the same region 
of the chromosome. It is possible that mini-Tn5 insertion occurs preferentially in VGC2. Alternatively, as the negative 

40 selection used to identify mutants with attenuated virulence (5) was very stringent (reflected by the high LD 50 values 
for VGC2 mutants) it is possible that, among the previously unknown genes, only mutations in those of VGC2 result 
in a degree of attenuation sufficient to be recovered in the screen. The failure of previous searches for S. typhimurium 
virulence determinants to identify VGC2 might stem from reliance on cell culture assays rather than a live animal model 
of infection. A previous study which identified regions of the S. typhimurium LT2 chromosome unique to Salmonellae 

45 (40) located one such region (RF333) to minutes 30.5 - 32. Therefore, RF333 may correspond to VGC2, although it 
was not known that RF333 was involved in virulence determination. 

Comparisons with the type III secretion systems encoded by the virulence plasmids of Yersinia and Shigella as 
well as with VGC1 of Salmonella indicates that VGC2 encodes the basic structural components of the secretory ap- 
paratus. Furthermore, the order of ORFs 1 -8 in VGC2 is the same as the gene order in homologues in Yersinia, Shigella 

so and VGC1 of S. typhimurium. The fact that the organisation and structure of the VGC2 secretion system is no more 
closely related to VGC1 than to the corresponding genes of Yersinia, together with the low G+C content of VGC2 
suggests that VGC2, like VGC1 (40, 41 , 42) was acquired independently by S. typhimurium via horizontal transmission. 
The proteins encoded by ORFs 1 2 and 1 3 show strong similarity to bacterial two component regulators (39) and could 
regulate either ORFs 1-11 and/or the secreted proteins of this system. 

55 Many genes in VGC1 have been shown to be important for entry of S. typhimurium into epithelial cells. This process 
requires bacterial contact (2) and results in cytoskeletal rearrangements leading to localised membrane ruffling (43. 
44). The role of VGC1 and its restriction to this stage of the infection is reflected in the approximately 50-fold attenuation 
of virulence in BALB/c mice inoculated p.o. with VGC1 mutants and by the fact that VGC1 mutants show no loss of 
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10 



virulence when administered i.p. (8). The second observation also explains why no VGC1 mutants were obtained in 
our screen (5). In contrast, mutants in VGC2 are profoundly attenuated following both p.o. and i.p. inoculation. This 
shows that, unlike VGC1 . VGC2 is required for virulence in mice after epithelial cell penetration, but these findings do 
not exclude a role for VGC1 in this early stage of infection. 

Thus in summary mapping the insertion points of 16 signature-tagged transposon mutants on the Salmonella 
typhimurium chromosome led to the identification of a 40 kb virulence gene cluster at minute 30.7. This locus is con- 
served among all other Salmonella species examined, but not present in a variety of other pathogenic bacteria or in 
Escherichia coli K1 2. Nucleotide sequencing of a portion of this locus revealed 11 open reading frames whose predicted 
proteins encode components of a type III secretion system. To distinguish between this and the type ill secretion system 
encoded by the inv/spa invasion locus we refer to the inv/spa locus as virulence gene cluster 1 (VGC1) and the new 
locus as VGC2. VGC2 has a lower G+C content than that of the Salmonella genome and is flanked by genes whose 
products share greater than 90% identity with those of the £ coli ydhE and pykF genes. Thus VGC2 was probably 
acquired horizontally by insertion into a region corresponding to that between the ydhE and pykF genes of E. coli. 
Virulence studies of VGC2 mutants have shown them to be attenuated by at least five orders of magnitude compared 
is with the wild type strain following oral or intraperitoneal inoculation. 
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Example 5: Identification of virulence genes in Streptococcus pneumoniae 

(a) Mutagenesis 

In the absence of a convenient transposon system, the most efficient way of creating tagged mutants of Strepto- 
coccus pneumoniae is\o use insertion-duplication mutagenesis (Morrison etal(1984)J. Bacteriol 159, 870). Random 
S. pneumoniae DNA fragments of 200-400 bp will be generated by genomic DNA digestion with a restnction enzyme 
or by physical shearing by sonication followed by gel fractionation and DNA end-repair using T4 DNA polymerase. The 
fragments are ligated into plasmid pJDC9 (Pearce et a/(1993) Mol Microbiol 9, 1037 which carries the em? gene for 
erythromycin selection in E. co//and S. pneumoniae), previously modified by incorporation of DNA sequence tags into 
one of the polylinker cloning sites. The size of cloned S. pneumoniae DNA is sufficient to ensure homologous recom- 
bination, and reduces the possibility of generating an unrepresentative library in E. coli (expression of S. pneumoniae 
proteins can be toxic to E. coli). Alternative vectors carrying different selectable markers are available and can be used 
in place of pJDC9. Tagged plasmids carrying DNA fragments are introduced to an appropriate S. pneumoniae strain 
selected on the basis of serotype and virulence in a murine model of pneumococcal pneumonia. Regulation of com- 
petence for genetic transformation in S. pneumoniae is governed by competence factor, a peptide of 17 amino acids 
which has been characterized recently by Don Morrison's group at the University of Illinois at Chicago and which is 
described Havarstein, Coomaraswamy and Morrison (1995) Proa Natl. Acad. Sci USA92, 11140-11144. Incorporation 
of minute quantities of this peptide in transformation experiments leads to very efficient transformation frequencies in 
some encapsulated clinical isolates of S. pneumoniae. This overcomes a major hurdle in pneumococcal molecular 
genetics and the availability of the peptide greatly facilitates the construction of S. pneumoniae mutant banks and 
allows flexibility in choosing the strain(s) to be mutated. A proportion of transformants are analysed to verify homologous 
integration of the plasmid sequences, and checked for stability. The very low level of reversion associated with mutants 
generated by insertion-duplication is minimized by the fact that the duplicated regions will be short (200-400 bp); how- 
ever if the level of reversion is unacceptably high, antibiotic selection is maintained during growth of the transformants 
in culture and during growth in the animal. 

(b) Animal model 

The S. pneumoniae mutant bank is organized into pools for inoculation into Swiss and/or C57B1/6 mice. Preliminary 
experiments are conducted to determine the optimum complexity of the pools and the optimum inoculum level. One 
attractive model utilises inocula of 10 5 cfu, delivered by mouth to the trachea (Veber et al (1993) J. Antimicrobial 
Chemotherapy 32, 473). Swiss mice develop acute pneumonia within 3-4 days, and C57B1/6 mice develop subacute 
pneumonia within 8-1 0 days. These pulmonary models of infection vield 1 0 8 cfu/lung (Veber et a/(1 993) J. Antimicrobial 
Chemotherapy 32, 473) at the time of death. If required, mice are also injected intraperitoneal^ for the identification 
of genes required for bloodstream infection (Sullivan et a/ (1993) Antimicrobial Agents and Chemotherapy 37 t 234). 

(c) Virulence gene identification 

Once the parameters of the infection model are optimized, a mutant bank consisting of several thousand strains 
is subjected to virulence tests. Mutants with attenuated virulence are identified by hybridisation analysis, using labelled 
tags from the 'input' and 'recovered' pools-as probes. If S. pneumoniae DNA cannot be colony blotted easily, chromo- 
somal DNA is liberated chemically or enzymatically in the wells of microtitre dishes prior to transfer onto nylon mem- 
branes using a dot-blot apparatus. DNA flanking the integrated plasmid is cloned by plasmid rescue in E. coli (Morrison 
et al (1984) J. Bacteriol 159, 870), and sequenced. Genomic DNA libraries are constructed in appropriate vectors 
maintained in either E coli or a Gram-positive host strain, and are probed with restriction fragments flanking the inte- 
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grated plasmid to isolate cloned virulence genes which is then fully sequenced and subjected to detailed functional 
analysis. 

Example 6: Identification of virulence genes In Enterococcus faecalis 

5 

(a) Mutagenesis 

Mutagenesis of £ faecalis is accomplished using plasmid pAT112 or a derivative, developed for this purpose. 
pAT112 carries genes for selection in both Gram-negative and Gram-positive bacteria, and the aff site of Tn 1545. It 

10 therefore requires the presence in the host strain of the integrase for transposition, and stable, single copy insertions 
are obtained if the host does not contain an excisionase gene (Trieu-Cuot et al (1991) Gene 106, 21). Recovery of 
DNA flanking the integrated plasmid is accomplished by restriction digestion of genomic DNA, intramolecular ligation 
and transformation of £ coli. The presence of single sites for restriction enzymes in pAT112 and its derivatives will 
(Trieu-Cuot et al (1991) Gene 106, 21) allows the incorporation of DNA sequence tags prior to transfer to a virulent 

15 strain of E. faecalis carrying plasmid pAT1 45 (to provide the integrase function) by either conjugation, elect roporation 
or transformation (Trieu-Cuot efa/(1991) Gene106, 21; Wirth ef a/(1986) J. Bacteriol, 165, 831). 

(b) Animal model 

20 A large number of insertion mutants are analysed for random integration of the plasmid by isolating DNA from 

transcipients, restriction enzyme digestion and Southern hybridisation. Individual mutants are stored in the wells of 
microtitre dishes, and complexity and size of pooled inocula are optimised prior to screening of the mutant bank. Two 
different models of infection caused by E. faecalis are employed. The first is a well established rat model of endocarditis, 
involving tail vein injection of up to 10 8 cfu of £ faecalis into animals that have a catheter inserted across the aortic 

2S valve (Whitman et al (1 993) Antimicrobial Agents and Chemotherapy '37, 1 069). Animals are sacrificed at various times 
after inoculation, and bacterial vegetations on the aortic valve are excised, homogenized and plated to culture medium 
to recover bacterial colonies. Virulent bacteria are also recovered from the blood at various times after inoculation. The 
second model is of peritonitis in mice, following intraperitoneal injection of up to 10 9 cfu of E. faecalis (Chenoweth et 
al (1990) Antimicrobial Agents and Chemotherapy 34, 1800). As with the S. pneumoniae model, preliminary experi- 

30 ments are done to establish the optimum complexity of the pools and the optimum inoculum level, prior to screening 
the mutant bank. 

(c) Virulence gene identification 

3S Isolation of DNA flanking the site of integration of pAT112 using its E. coli origin of replication is simplified by the 

lack of sites for most of the commonly used 6 bp recognition restriction enzymes in the vector. Therefore DNA from 
the strains of interest are digested with one of these enzymes, self-ligated, transformed into £ coli and sequenced 
using primers based on the sequences adjacent to the att sites on the plasmid. A genomic DNA library of £ faecalis 
are probed with sequences of interest to identify intact copies of virulence genes which are then sequenced. 

40 

Example 7: Identification of virulence genes in Pseudomonas aeruginosa 

(a) Mutagenesis 

45 Since transposon Tn5 has been used by others to mutagenise Pseudomonas aeruginosa, and the mini-Tn5 de- 

rivative that was used for the identification of Salmonella typhimurium virulence genes (Example 1 ) is reported to have 
broad utilisation among Gram-negative bacteria, including several pseudomonads (DeLorenzo and Timaris (1994) 
Methods Enzymol. 264, 386), a R aeruginosa mutant bank is constructed using our existing pool of signature tagged 
mini-Tn5transposons by conjugal transfer of the suicide vector to one or more virulent (and possibly mucoid) recipient 

50 strains. This approach represents a significant time saving. Other derivatives of Tn5 designed specifically for P. aeru- 
ginosa mutagenesis (Rella ef al (1985) Gene 33, 293), may alternatively be employed with the mini Tn5 transposon. 

(b) Animal model and virulence gene identification 

55 The bank of P. aeruginosa insertion mutants is screened for attenuated virulence in a chronic pulmonary infection 

model in rats. Suspensions of P. aeruginosa cells are introduced into a bronchus following tracheotomy, and disease 
develops over a 30 day period (Woods et al (1982) Infect Immun. 36, 1223). Bacteria are recovered by plating lung 
homogenates to laboratory medium and sequence tags from these are used to probe DNA colony blots of bacteria 
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used as the inoculum. It is also possible to subject the mutant bank to virulence tests in a model of endogenous 
bacteremia (Hirakata etal (1992) Antimicrobial Agents and Chemotherapy 36, 1198), and cystic fibrosis (Davidson et 
al (1995) Nature Genetics 9, 351) in mice. Cloning and sequencing of DNA flanking the transposons is done as de- 
scribed in Example 1 . Genomic DNA libraries for the isolation and sequencing of intact copies of the genes are con- 
structed in the laboratory by standard methods. 

Example 8: Identification of virulence genes in Aspergillus fumigatus 

(a) Mutagenesis 

The functional eqiuvalent of transposon mutagenesis in fungi is restriction enzyme mediated integration (REMI) 
of transforming DNA (Schiestl and Petes (1991) Proc. Natl. Acad. Sci. 88, 7585). In this process, fungal cells are 
transformed with DNA fragments carrying a selectable marker in the presence of a restriction enzyme, and single copy 
integrations occur at different genomic sites, defined by the target sequence of the restriction enzyme. REMI has 
already been used successfully to isolate virulence genes of Cochliobolus (Lu era/ (1994) Proc. Natl. Acad. Sci. USA 
91, 12649) and Ustilago (Bolker et a/(1995) Mol. Gen. Genet. 248, 547), and have shown that incorporation of active 
restriction enzyme with a plasmid encoding hygromycin resistance leads to single and apparently random integration 
of the linear plasmid into the A. fumigatus genome. Sequence tags are introduced into a convenient site in one of two 
vectors for hygromycin resistance, and used to transform a clinical isolate of A. fumigatus. 

(b) Animal model and virulence gene identification 

The low-dose model of aspergillosis in neutropenic mice in particular closely matches the course of pulmonary 
disease in humans (Smith et al (1994) Infect Immun. 62, 5247). Mice are inoculated intranasally with up to 1 ,000,000 
conidiospores/mouse, and virulent fungal mutants are recovered 7-10 days later by using lung homogenates to inoc- 
ulate liquid medium. Hyphae are collected after a few hours, from which DNA is extracted for amplification and labelling 
of tags to probe colony blots of DNA from the pool of transformants comprising the inoculum. DNA from the regions 
flanking the REMI insertion points are cloned by digesting'the transformant DNA with a restriction enzyme that cuts 
outside the REMI vector, self ligation and transformation of E. coli. Primers based on the known sequence of the 
plasmid are used to determine the adjacent A. fumigatus DNA sequences. To prove that the insertion of the vector was 
the cause of the avirulent phenotype, the recovered plasmid is recut with the same restriction enzyme used for cloning, 
and transformed back into the wild-type A. fumigatus parent strain. Transformants that have arisen by homologous 
recombination are then subjected to virulence tests. 
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Claims 

1. A microorganism having a reduced adaptation to a particular environment obtained using a method comprising 
the steps of: 

(1) providing a plurality of microorganisms each of which is independently mutated by the insertional inacti- 
vation of a gene with a nucleic acid comprising a unique marker sequence so that each mutant contains a 
different marker sequence, or clones of the said microorganism; 

(1A) optionally removing auxotrophs from the plurality of mutants produced in step (1); 

(2) providing individually a stored sample of each mutant produced by step (1 ) and providing individually stored 
nucleic acid comprising the unique marker sequence from each individual mutant; 

(3) introducing a plurality of mutants produced by step (1) into the said particular environment and allowing 
those microorganisms which are able to do so to grow in the said environment; 

(4) retrieving microorganisms from the said environment or a selected part thereof and isolating the nucleic 
acid from the retrieved microorganisms; 

(5) comparing any marker sequences in the nucleic acid isolated in step (4) to the unique marker sequence 
of each individual mutant stored as in step (2); 

(6) selecting an individual mutant which does not contain any of the marker sequences as isolated in step (4); 
and 

(6A) optionally determining whether the mutant selected in step (6) is an auxotroph. 

2. A gene which allows a microorganism to adapt to a particular environment obtained using a method comprising 
the steps of: 

(1) providing a plurality of microorganisms each of which is independently mutated by the insertional inacti- 
vation of a gene with a nucleic acid comprising a unique marker sequence so that each mutant contains a 
different marker sequence, or clones of the said microorganism; 

(1 A) optionally removing auxotrophs from the plurality of mutants produced in step (1); 

(2) providing individually a stored sample of each mutant produced by step (1 ) and providing individually stored 
nucleic acid comprising the unique marker sequence from each individual mutant; 

(3) introducing a plurality of mutants produced by step (1) into the said particular environment and allowing 
those microorganisms which are able to do so to grow in the said environment; 

(4) retrieving microorganisms from the said environment or a selected part thereof and isolating the nucleic 
acid from the retrieved microorganisms; 

(5) comparing any marker sequences in the nucleic acid isolated in step (4) to the unique marker sequence 
of each individual mutant stored as in step (2); 
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(6) selecting an individual mutant which does not contain any of the marker sequences as isolated in step (4); 
and 

(6A) optionally determining whether the mutant selected in step (6) is an auxotroph; 

(7) isolating the insertionally-inactivated gene from the individual mutant selected in step (6); 

5 (8) isolating from a wild-type microorganism the corresponding wild-type gene using the insertionally-inacti- 

vated gene isolated in step (7) as a probe. 

3. A microorganism comprising a mutation in a gene as defined in Claim 2. 

to 4. A microorganism according to Claim 1 or 3 for use in a vaccine. 

5. A vaccine comprising a microorganism according to Claim 1 or 3 and a pharmaceutically-acceptable carrier. 

6. A gene according to Claim 2 which is isolated from the Salmonella typhimurium genome and hybridises to the 
75 sequence shown in Figure 5 under stringent conditions. 

7. A gene according to Claim 2 which is isolated from the Salmonella typhimurium genome and hybridises to a se- 
quence shown in Figure 6 under stringent conditions. 

20 8. A polypeptide encoded by a gene according to any one of Claims 2, 6 or 7. 

9. A method of identifying a compound which reduces the ability of a microorganism to adapt to a particular environ- 
ment comprising the step of selecting a compound which interferes with the function of a gene according to any 
one of Claims 2, 6 or 7 or a polypeptide according to Claim 8. 

25 

10. A compound identifiable by the method of Claim 9. 

11. A compound according to Claim 10 wherein the particular environment is a host organism. 
30 12. A compound according to Claim 11 wherein the host organism is a plant. 

13. A compound according to Claim 11 wherein the host organism is an animal. 

14. Use of a compound according to any one of Claims 11 to 13 for treating infection of said host organism with said 
35 microorganism. 

15. A molecule which selectively interacts with, and substantially inhibits the function of, a gene according to any one 
of Claims 2, 6 or 7 or a nucleic acid product thereof. 

40 1 6. A molecule according to Claim 1 5 which is an antisense nucleic acid or nucleic acid derivative. 

17. A molecule according to Claim 15 or 16 which is an antisense oligonucleotide. 

18. A molecule according to any one of Claims 15 to 17 for use in medicine. 

45 

19. Use of an effective amount of a molecule or compound according to Claim 11 or 15, wherein said gene is present 
in a microorganism or a close relative of said microorganism, in a manufacture of a medicament for treating a host 
which has, or is susceptible to, an infection with said microorganism. 

50 20. A pharmaceutical composition comprising a molecule or compound according to Claim 13 or 15 and a pharma- 
ceutical ly acceptable carrier. 

21. The VGC2 DNA of Salmonella typhimurium or a part thereof, or a variant of said DNA or a variant of a part thereof. 

55 22. A mutant bacterium wherein if the bacterium normally contains a gene that is the same as or equivalent to a gene 
in VGC2, said gene is mutated or absent in said mutant bacterium. 

23. A method of making a bacterium according to Claim 22. 
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24. Use of a mutant bacterium according to Claim 22 in a vaccine. 

25. A pharmaceutical composition comprising a bacterium according to Claim 22 and a pharmaceutical^ acceptable 
carrier. 

26. A polypeptide encoded by VGC2 DNA of Salmonella typhimurium or a part thereof, or a variant of said polypeptide 
or a variant of a part thereof. 

27. A method of identifying a compound which reduces the ability of a bacterium to infect or cause disease in a host 
comprising the step of selecting a compound which interferes with the function of a gene in VGC2 according to 
Claim 21 or a polypeptide according to Claim 26. 

28. A compound identifiable by the method of Claim 27. 

29. A molecule which selectively interacts with, and substantially inhibits the function of, a gene in VGC2 of Salmonella 
typhimurium or a nucleic product thereof. 

30. A molecule or compound according to Claim 28 or 29 for use in medicine. 
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Sb)Ct: 
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299 



845 



0, 

CCACCAGCCGCTCCGCTACCACCCCCACGCCACCCATATTCAAATTCACGCCCGCCAAAT 300 

in i i i ii nun iiiiiiiitii miii inn ii( inn mil 

CCAACCGTTGGGCGGCTACCAGGGCCAGGCGACCGATATCGAAATrCATGCCCGTGAAAT 844 

c/pP gene- 1 

TTTGAAAGTAAAAGGGCGCATGAATGAACTTATGRMKYKMMATACGGCTCANTCTCTTGA 240 
| 1 1 1 1 1 1 1 Mill II I I I M! fill H m I I IMMllIM II iM 
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I I milll III! II I I I 
905 ACAGATTGAACGTGATACCGA 92S 



Score = 231 (63.8 bits), 
1 titles = 5S/66 (83%) 



Expect = 4.0e-24, Poisson P(2) = 4.0e-24 
Positives = 55/66 (83%), Strond = Mmus 



Query: 194 T GAAGCGGT AGAGTACGGTT TGCT TGA( T CAATTT TGA(CCATCGT AATTGATGCCCTGG 1 35 

III 1 1 I III II I I I I I I I M I II II III I I ill I I I Ml III I 1 1 Ml I I 
Sbjct : 950 T GAAGCGGTGGAAl ACGGT(TGGTCGATTCGAT T(T GACCCATCGT AATTGATGCCAGAG 1009 



Query: 134 



129 



Sbjct: 



1010 



ACGGAA 
Hill 
GCGCAA 1015 



>ECCLPXGN£ 22327S c.coli C!cX gene, complete CDS 
Lengtn = 19*5 

Minub Strond HSPs: 



Score -- 36* (100.6 bits). Expect * 1.6e-20, P = 1.6e-20 

Icentities * 88/107 (821). Positives = 88/107 (S2&) , Strono = Mmus 

325 GAT ATT GAAATT C AC GCCCGC GAAAT T 1 1 GAAAGT AAAAGGGCGC AT GAA TGAACTTilC 
I I I I I I HI II I I Mill I I I III Ml III I I I I I Ml I I III M I III M I I M 
; GAT AT CGAAATTC AT GCCCGTGAAATTO GAAAGT T AAAGGGCGC AT GAA1 GAAC T 1 i I { 



Query 
Sojct 

Query . 

Sb}ct 



265 RMKYKMMAl ACGGGT(ANTCTCTTGAG(AGATTGAASGTGA1ACTGA ?\<j 
III 1 I I Ml I I I I II M I I II I I I I I I M I I II 
b! GCGCTTCATACGGGKAATCATTAGAACAGATTGAACGTGATACCGA \Q7 



Score i 23: (63.8 bits). Expect = b.he 2< . Poisson P(2} ^ 6.Se-2- 
Identities • SS/66 (8350. Positwes = 55/66 (83$). Strond r Minus 



Query ■ 
Sbjct ■ 
Ouery : 
Sbjct 



19< T GAAGCGGT AGAGT ACGGTT TGGT I GA( I C AAT T T T Ga(((a 1 C GT aa T 1 GaT GC(( T CC- 
I M II I I II II I I I II I I M I M II Ml M I I I I III I I I I I M M M I i 

132 TGAAGCGGTGGAAlACGGTCTGGTCGATKGATTCTGACCCATCGTAATrGATGCCAUG 

Fetch )> GbbarEcoclppa 

- OK then type J Biol Chem 265, 12536 
(1990) 



i?: 



134 ACGCAA 
I I M I 
19c 7 GCGCAA 



129 



197 



Figure 5 
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A) new virulence factors with similarity to sequenced genes: 

1. pirio 

similarity to clpP (E.coli) 
(Figure 5 of application) 

2. p2D6 

similarity to lcrD (Yersinia spp.) 
sequence p2D6_l_I 
CXTlCmATGTACGGGCM 

TTGGCTGGCATCCCATCAAGCGAGAAACGTGCGCTAACTTCCGCCA^ 
ACAATAAATTGCACGATAGTAATGATGGTAAATACGACCAAC 

AACTTACCGAAAGCATCCAC^AATATTACCGGCATTATGTTGTAACAGTACCCAGCCCT 
TTGGGGAGTTAACAACCGATTTAT 



3, s4C3 



probably same gene as p2D6, but different region 

similarity to S. typhimuriwn invA and Yersinia spp.lcrD 
sequence s4C3_l_U 

GCGCGGACGCTAGTGTGGTGGCTGACAGCCAGACGTTACCGAACGGGATGGGGCAGA 

CAAAAGACATGGCCCATAAGGCGCAAGGTTTTCXX^CTGGACGTTTTCGCGG 

GTCTTATTAAAATGTGTCCTGCTTCGGCATATGTATCGAACCCTCGGAGCAAA^ 

ATTAGTACGTTTGGGTCGGTTGCTGTTATTCCTTGGGCTCGGAAAAAGAGTGCC^ 

GATTTGGCAGACTGGCCGCCTAAT 

sequence s4C3_l_R 

CACTATAGGGAAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCTACTAGTCATATGG^ 

ctgtataagagtoiggattagaggacatgcgccggg^ 

ATTTGCGGAAACCACAGACTTTTTGCGGCGAATGAGGATAATTGG 

AGCGAGAGTGATAAAAGGAAAGCC^GGAATTAAAGCGAGGAGCATTAAAACCACAGCGG^ 
CGACTGAGGTTGTCTGGCAATTTG 



4. p3F4 



similarity to invG (S.typhimurium) 
sequence p3F4_l_U 

TGCAGGCCGACTCTAGAGGATCCCCGGGTACCGGTAATTTCTTTAACCTCGCATCCCGGTGGATGAAAG 

GA^TTCTGGCTGCGTAAGTAATGAATGAACCGCCCAGTAGATAAAATATTGAAAGTGATAACC^ 

TTTTAATAACGATGCACMATATACATATAACATC^GGCATCAAACCAG^ 

TGCC^GGTTATTCAAACTATCGACCGGTGGTCCAGGCGGGAATTTTTCCACTAAATGTA^ 
ATGGGCTAATTGGTATAGGCGGAT 

Figure 6 
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5, p7G2 

similarity to yscC (Yersinia spp.) 
sequence p7G2_l_U 

CCTGTGATTCCGGATGAAATAGCTTTTACGAAAGCTGTCAGACNTGCTGAAGAATACGCTGCAAATGGT 
AAGCTTGTAACTTTTGGGTATTGTTCCAACGCATG 

GTTGATAGGAAATGACGCTTATGCAGTGGCTGAATTTGTGGAGAAACCGGATATCGATACCGCCCGTGA 

CTATTTCAAATCAGGGGAAATATTACTGGCCTAGCGGCGATGTTTTTATTTG 

AAACGAATTAAACGTATCTATCACCCCCAAATTCATACAGCTTGTGAA 

sequence p7G2_3_0 



TTACTAAACAGGGCCCCGGACCaTGTAAACACCACGCtTGCCAAcACTAAAAAACGATGCtTGCcGTAA 
AAAAATTGAAcGTTATTTACTTAATAcGCCTATTTTATTTACATTATGCACGGACAGAGGGTGAGGATT 
AAATGGATAATATTGATAAT AAGTAT Ac TCCACAGCT ATGT AAAATTTTg GGGCc TATATCg GAT t TGg 

TTGtTTtTAATTTAGCCtTATGGcTTtCACTAGGATGTGTCTATTTTTTTtGTGGtCAAGCACAGAGAT 
TTATTCCCCaACCACC 

sequence p7G2_l_I 

TTTCCTTGCCGTGACAGTCCGGGATGCGAGGTTAACGAAATTACCGGCACCAAAGCTGTGGAGGTGAGC 
GGTGTCCCCAGCTGCCTGACTCGTATTAGTCAATTAGCTTCAGTGCTGGAT 

AAAGACAGTGCGGTGAGTGTAAGTATATACACGCTTAAGTATGCCACTGCGATGGATACCCAGTACCAT 

TATCGCGATCAGTCCGTCGTGGTTCCAGGGGTCGCCTAGTGTATTGCGTGAGATGAGTAACACCAGCGT 
CCCGACGTCATCGACGAACAATGG 



6. p9B7 

similarity to fliQ, invX (E.coli) 
sequence p9B7_l_I 

CATGAGTAACCTACCCAACTGTAATCTTTACCAATATGCATCATAATCTTCTGCTGGTAAATGATTGG^ 
AATATCGGAAAGGTAAGTGACATAAGCACGCCATTACGTAAAAGTGCGGCCCCTAAACTGCCACTTTTT 
AATAAGGGAAGTAATAAAGAAAGGCTCAATGGTCGAATAAAAGCC^ 

TTTACCTGTTGTGCCATTCAACCATGCTCTCCAATTCGTAACATTATCTGCCGGGTATAATTCAACAGG 
ATACCGCT AAGCCA TGGGT AG 

sequence p9B7_3_0 

ATTCCAGCCCCCGGGCCATCTAACCACTATGAACAATCATCTTCTGGGTGGACAATCATTGGTACCATC 
GGCCAGGCTTGTGCAATATGTATGTCATCACGTAAAAGCGCGGCCCCTTAATCTCCCCATTCTTCCTTA 
AGGGCAGTTATCACGGCTGGCTCAATGGCCGGCTTAACAGCCACAG 

7. s6F5 

similarity to yscU ( Y . enterocolitica) 
sequence s6F5_l_0 

GAGGCGCGTCTTCGGTTGAGGGTCGCCCTCCAGATCTTTATGCTCCTCTTTTACGTCATCTTTACTCAT 

TTTAAGATCTTTTCTAATCTTATAATATTGAAAAGAATAGTCCAGTATGCCAACGACGAAATAAAGAAA 

CATCACCCCAACCCATAACCATTTTTTCAATGATGAAAGCACAAGCACGCCACAGGCTA€ACCACAGCC 

CGGAGGGGGCCGGAAAGTGCTGGGATCTTGATTAATGAAAAAGGCAAAGGGAAGAGATAGGATGATGC^ 
TGCTGCTTCCACGCAGATTATTCATCTTCC 

Figure 6 
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B) new sequences without similarity to entries in DNA or protein 
databases: 

1. S4D10 

sequence s4DlO_l_U 

AGTTGCCGTATTTATTAAATATTCACCTCAGGTC^ 

TACTAGAGATATCACTCCCTGGGTTGCAATACAGTACGATTAGTTATCTTGATGCAGC 
AGAATGGCAGCTGACGTACCCGCGAGACAAACATTCTGGATTAT 
AAGGTGGTGAAGTGGTTGATGAAATACCCCTATCQCTTGCATGTTATCGCTC 
AGCGGGCATCCTCGATCGGCT 

sequence s4D10_l_R 

CAAGAGACAGATCCAACTCGGGCCGATCGCCATAACGCCAGCAGTT^ 

CC^GCCATTCCGGTAC^GCGTAACGAGCAGGTTGCCAGAAATAACGATAAAGTT 

CAGGTCGGCTCAAAAACGGGGTCICAGGCAAAAATAGCC^ 

TCTCAACGATAACATCAACGGATAAG<Xn , A 

GGATAACGTCCATAATCCAGA 



2. s4HlO 

sequence s4H10_l_U 

AGGGCTTTATTGATTCCATTTTTACACTGATGAATGTTCCGTTGCGCT 

TCTAGAGTCGACCTGCAGAACCGAGCCAGGAGCAAATTAATTTTTTTG 

CATCCACCAGTAACGCCAGTGCTTTATTACCGCAGGTT 

TAACGGTAGGCGTCGATTATCTTGTCAGAATATCAGGCGCAGCATC 

ACATGGCATGAAGGGGCAACCC 

sequence s4H10_l_R 

CACTATAGGGAAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCTACTAGTCATATGGATTCCTAGG 

CGGCCAGATCTGATCAAGAGACAGATCCAACTCGGGCCGATCGCCATAACGCCAGC^ 

AAAGCCCAGCTTATCCAGCCATTCCGGTACAGCGTAACGAGCAG 

CAACACCTCGGGATCAGGTCGGCTCAAAAACGGGGTCTCAGGCAAAAATAGCCGA 

CCTAATAACAGTCCTGTCAACG 



3. p4G5 

sequence p4G5_l_0 

CCCCCCCCCTTCTCCTGGCTTACACAGCCCCAGACCGGCGCTGGAAAAGGCCATTCCC 

GGCCAGCAACATATTTTCACGCGCCGCCAGLATCGTGGCCGTAACCCACGGCTTT 

AATCATCGCTATCGCGCCAATCGCCAGGCTGTCGGTAAACGGCGTGGCGTTGLAGCGCGCTCT 

AATCGCATGCGTCAACGCATCGATACCGGTCATCGCCGTC^CGTTTGGCGGAACGCCTTCGGTCACGGA 
AGCATCAAGAATCGCCACGTCCGGC 

sequence p4G5_l_U 

CGCGAACGTGCGCCGCAACTGCTTGTGGACGGTGAATTGCAGTTTGACGCCGCTTTCGTGC 

GCCGCGGUUIAAGCGCCTGACAGCCCGCTGCAAGGCCGCGCCAACG 

GCGGGCAATATTGGCTACAAAATCACTCAGCGTCTGGGAGGCTATCGCGCTCT 

GGGCTTCX^GCGCCGCTTCACGACCTCTCCCGAGGCTGTAGCGTGCAGGAAATTATCGAA 
GTGAGAAAACCAA 

Figure 6 
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4. p7A3 

sequence p7A3_l_u 
CC^CCTAGCATGCCTGGCGTTGTCCGGTTA 

tctcgaatcatg<xkxxrrcatgtatcgggatg<;tgtaatctgtgatgactt 

ggatgttttggataaaaatgggttacccgcatgctgaagtatccagcgaagggc 

ttcatgatgatatacaaatggatcagcaatgcgg<^gcttcaac^ 

TATTGCACTGGCAGATTAGTCACTCTC 
sequence p7A3_l_I 

CCCTTCCi^GGCTCGACAGGTACACAGCCAGCCACTGGTGCAGGCAGTT 
GGAGX^TATCCTC^TATATTAAAGAAAC^ 

AAAATGCGTTGATGAGATTCATCCAGCACACCACTGATAACAAAAGA 

ACAAGCCCCACTAAACCGCTCTCTATTATCGCAGAAATAATATCATCCCCCTGAGACTGATGAGAGT 
CTATTCTGCCAGCGCAAATAACCC 



5. plOEll 
sequence p!0Ell_l 

ATACCGAGTATTAAGCGGCTGTGTAACATCGTCATCCAACAACATA 
AAACCGCATCGTGTCATGTGCCTGTT^AGGGTCGGGTCTTTTTTCATG 

ATACTGGAAATTTCCCCCCACTTACTGATAAGCCCTGTCAGTTGGGTAAGGACAGAGTTAAGCTC 

GACATTTTTTGGAATGGTTATCTTTCCCCGACT 

AGACGCTTTGGTCGCCCGTAGGGCACC 

sequence plOEll_U 

GCCGTATGCCTGCAGTTGCCCGGTTATTGCTCGTCAAGCGAACCGATGCCAAAG<3TGA 

GAATCATGGGGGGTCATGTATCGGGATGGTGTAATCTGTGATGACTTATTGGTACGAG^ 

GTTTTGGTAAAAATGGGTTACCCCCATCCT^ 

gatgatattcaaatgggtcagcaatggggcaaggttcaac^ 
ggactggcagattagtcactctca 



6. s4B9 

sequence s4B9_l_o 

gg^gacctgcccgcggcgcaactttc 

agotacctaagccttgtcttgcctatgtgacaatactgcttggagaacacccggacgtccatgattat 
cctatacagatcacagcggatgggg^tggtgaatcg^ 

attgctattgagatagaaaaacaccccgcttcaacttggattttgaataatgtaatacgcaatcacc^ 
acactatattcgggtggcgtataa 

sequence s<2B9_l_R 

ttcgagctggg<xjvccgctaatatctttaacctcgcatcccggtgatgaaaggatattctggctg^ 

AGTAATGAATGAACCGGC(^GCAGATAAAATATTGACAGTGATAACCCGATGTTTTTTTAACGATGCAG 

GCTATACATATAACATAGCTGGCCACCAAC^CAGCTGAAGTAAATCATATTGTTGCTGCC^ 

CACACTATTGTCCGGCGGGCCAGGGGGGATTTTCCCCCTAAATCTCGCTGGTTCTCAAA 

7. p4F8 

sequence p4F8_i_I 

AGTCTACGATTTCGCTATATCTTCTCTTAATCATGGCCGCCATTTGTGGATGCGATTTTAAAATATCCG 

GGCGATCTTTCATTAAAAAATAAAGATTCCCCATGACTTCACAGATAAAGGTATCGGTAT 

TACGTAACAATTCGTTCTCTTCGTGTGCCTCCATGATGCGA 

Figure 6 
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cattatgaacccgaaatctttctctttcx:gatg^^ 

TAAGCCGCTCTCGTGTGCCGGCGC 

8. p7B8 

sequence p7B8_l O 

ck:gcccottaattggttgaggcggctggtattcttgtaagggtaatactagcgagaccc^ 

ccccggggacactttttactctcagattaccgccca^ 

caattcctgtaccttcx:gaatttgtctctgcttgataaaaagcaga^ 

TTTCAATCCCCCCACCGCTATCGCTAACCAGAAATATTAATTGTTCCTCACCAAGAT 
GTATCCCTCCCCCCTCGGGAAAT 

9. p8G12 

sequence p8G12_l_I 

GGATAAGATCCCGGATAAGXATGTCAGGCTCGTATGCACAACAGGCATTATAAACCTCTAGAC^ 

TAACATGCTCTACTATTTTAAAATGAGGCCAGGGTAJVTAAGGCATTCATAATGCCG 

CATGATCGTCTACTAATAAGATCTTATATTCTTT^ 

AAGTAATGGTGTAGGTTGTGGAGATCATACGTATTTTCTGGCGTAAGTCGGTTAGTTCCTCCAGCGCGA 
TGATTTTCCCCATTTTTACGCGAT 

10. p9G4 

sequence p9G4_l_0 

TTCCATATTGCTCGTCCGGGGAGCGTGTTAATTCTTGATGATATACCAATGGATCTGCAATGGCG 
GTTCAACCATTACTTGGAGATATTCCCGGGTTATTGTACTGGGAGATTAGTCACTCTOITCA 
GGGGGTGATGTTATTTCTGGGATAATAGAGCAACGGCGTTAGCAGGGGTCG^ 
CWCGGTGCACTTTTGCGTATCACTGGGGTATCATAACTGAATCTCA 

sequence p9G4_l_u 

AATTCTTTTACCTCCATAAGCTGCGTGGCATAGCGATACAGAGTATTAAGGGGGTGTGTTACATCG^ 
TCCAACAACATACGCAGCGAGCCGCCACGCCGGAAAAACCGCATCGTGTCA^ 

GGGTCTTTTTTTCATGAGTACGTGTTCTGCGCTATCATACTGCAAATTTCCCCCCACTTACTGATAAGC 

CCTGTCAGTTGGGTAAGGACAGCGTTAAGCTCCTGAGACATTTTTTGAGTTGTTATCTGCCCCCCGACT 
CATAAG AT CGGG T AT TCCGCGGTGG 

11. p9B6 
sequence p9B6_l 

ATATCCCTAATGCTTTTCCTTAAAATAAATACCACGGAAGGA^^ 

GCAATGAACATCCGCTTTATTCCTGAAAACGATTACTCCGGCGCACGTTGTTCTGGCGTTACCTGAGCC 
AGCAAACGATATAATGGGGTGGTGACCCGCATACCGGTCATTGGCATCCCATCCA 

AAACTCATTAGGCCATAGGTAATATCATTAAGACGCTCTAATAAATGAGGGTGGGGGGCCCAAACTA 
ACTCCAGTATGTATTGAGTCA 

12. p6G5 

sequence p6G5_2_I 

CCCATGGGCGCAATTTGTTGCGCAGCGTTTACCCGACCATCGCGTTTATGAGCTGTAATTCATGGGGGG 

TAAAAACGGGCGTGACGACCCCAACGGAAGATAAGGCCGGGCTTAAACAGGAGATTATTGCTAA 

AGCGCAAAGTGTTGCTCGCCXLACAG<^GTAAGTATG 

AGCGCTTTAATGACGTGATTACCC^CGTCAATCTGCCGCCGTC7\GCC^GGTTGAACTGAAAG 
CTTTTTGCGCTAACG 

Figure 6 
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Figure 7 a 
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DNA sequence of VGC II from centre to left hand end 

ctccacaaccgacccaggaccaaattaatttttttcaacaattcctcmagatcaaccatccaccactaaccccagtcct 

» : - - ♦ 80 

GACGTC TTCGCTCGGT CC TCCTTT AATT AAAAAAACT TG TTAACGACTTTCTACTTCCT ACCTCCTCATT CCGCTCACGA 

* lowrarskli flnmC* khkhppvtpvl- 
t> C HTEPCAN* rr-TJACR«SlHQ'RQCr. 
c AGPSOtOIHrfCOLUKOCASTSMASA 

TTATTACCGCaGGTTATGTTGACCAGACAAATGGATTATATGCAGTTAACGCTACGCOT 

31 ♦ * --- ♦ * 16 

AATAATGGCGTCCAATACAACTCCTCTGTTTACCTAATATACCTCAATTGCCATCCGCACCT.^ATAGAACGGTCTTATAG 

t VYRRLC* POKMl!CS*R*AS:iLPCYH- 

* ITAGYVOQTHGtYAVWCRRRLSCONI 

C LtPOVHLTROHOTMOtTVGVOYLARIS- 

AcGGCGCAGCATGCCAAGCGCTTAATAAGOGGATAACATGGCATGAAGGTTCATCGTATAGTA 

161 — ♦ ♦ - ♦ 24i 

TgCCGCCTCGTACGCTTCGCGAATTATTCGACCTATTCT^ 

* G AACQALNKLONH A'RFI.V't rilSL 
C T A Q H A K R LI SWITWMEGSSYSI SYCPY 

C RRSMPSA* AG'HGMKVHRlVf!, TVLT* 

CCTTCT7TCTTACGGCATGTGATGTG<^TCTTTATCGCTCATTGCCAGAAGATGA^ 

?<1 " " — — * ~- — 321 

G CA^GAAAG AATGCCGTACACTACACCTAGAAAT AGCGAGTAACGGTCTTCTACTTCGCTT AG T T 7 AC6ACCG T AATC AA 

start yscJ*? 

J RSTLRKV M W 1 r T a H C Q K H K R . I K C W H Y t 

~ ^ L S Y G M* • CGSLSLIARR* SCSNAGITY- 

f f LTAC OV OL YR S L PtOCANQK LA LL 

ATGCAGCATCATATTGATGCGAAAAAAAACA(^ 

♦ • * * ♦ . » ♦ < 00 

TACGTCGTAGTATAACTACGCTTTTTTTTG TCCTTCT 

start yscJI 

c s 1 : *- B ft * t T r » RGWCHLTCR AVGSLLM- 

A A S Y • C tKKOECOCVTLRVEOSAVY- 
M0HHI0AKKNRKRMV - PYVSSSRQflN- 

TGCGCTTGACGCTACTTAGACTTAACCCTTATCCGCATAGGGCAGTTTACAACGGCGGATA/'.GATGTTTCCGGCTAA 

, . — ^_ , « eo 

ACGCOACTCCGATGAMCTGAATTGCCAATAGGCGTATCCCCTCAAATGTTGCCGCCTATTCTACAAAGGCCCATTAGT 

**.*IL*L#IGY PH RAvywCG* C V S g • S 

CC " G "LCLTVlRlGOrTTAOKMrPAK 0 - 
" NEAT * T * RLSA - GSLORRIFCrPLls- 

GTTAGTGGTATCACCCCAGCAAGAACAGGCAGMGATTAATTTTTTAAAAGAA 

"° "* * * * 4 • . S60 

CA^TCACCATAGTGGGCTCCTTCrrTCTCCGTCTTCTWTAAAAAATTTTCTTGTTTCTTA^CTTCCT 

'•'SGI TPGRTGRRLI F " KNKCtKCC* V R . 
"VHPRKMROKlMrLKCORIECMLSQ 

ATGGAGGGGCGTGATTAATGGCAAAAGTGACCATTGCGCTACCGACTTATGATGAGGGAAGT.^CGCTTCTCCGAGCTCA 

bol ... . . . . , „ 

* 640 

T ACC TC CCCGCACTAATT AC CGT TT TCACT GG TAACGCG ATGGC TGAAT ACT ACTCCCT TC AT TGCGAAG AGGCT CGAG T 

HR OVINGKSDHCATOL-- G K • R F S r L s _ 
G G A ♦ LMAKVT ! A LPTYOCGSmaS p"ss 
"ICRO'UQK - PLRYRLMHRCV7 LLRA0 . 

CTTCCCGTATTTATAAAATATTCACCTCAGGTCAATATGGAGGCCTTTCGGGTAAAAATTAAAGATTTAATAGAGATGTC 
^ ' ' * ' * * 7 ?0 

CA^CGCCATAAATATTTTATAACTGGAGTCC^GTTATACCTCCGGAAAGCCCATTTTTAATTTCTAAATTATCTCTACAG 

CK, VKirTSGOYGGlSGrN-CrNR Dv 
VAvr i«*srovNHCArRvr:iKOLltM'; 

»■ P * L • * I M L R S I W R P r G • K L K 1 * • R C 0 - 

Sequence 1 Figure 11 
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VTCCCTCCCTTCCAATACAGTAACATTACTATCTTCATCCACCCTCCTCAATTCACAATCCTACCTCACCTACCCOCC/ 

' " 60C 

fTACCCACCCAACCTTATCTCATTCTAATCATACAACTACCTCCCACGACTTAACTCTTACCATCCACTCCATCGCCCCT 

» * w V A I « • i) Y L 0 A A C * JONGS' R fRt 

i f I. 0 : 5 K I S | L ' H P A L f B f* V A 0 V f j, . 

cliiCNTVKLv£*CSLLNSEW-t.TYPP 

CACAy?.*CATTCTCGATTATCCACCTTATCAACCCCAATAAACCCAACCTCCTCAACTCCTTCATCAAATACCCTTATCCC 

001 * - - t aeo 

CTCTTTCTA.AGACCTAATACCTCCAATACTTGCGGTTATTTCCCTTCCACCACTTCACCAACTACTTTATGGGAATAGGC 

T s I I. D Y G R V Q R ' RCGGCvvOCIFLS*- 

CTrWIHOVlNAMKGKVVKWLMKYPYP 
H > K S G L W 1 L S T P 1 K G R w • S G ' ■ H T L I F - 

Tn insertion PMMll 
0 



TTGATGTT.^TCGTTGACAGGACTGTTATTAGGACTGGGCATCCTGATCGGCTATTTTTGCCTGAGACGCCGTTTTTGACC 
A^CTAC=ATAGCA*CTGTCCTGACA*TAAT^^ 



D V V D R 7 V | RSGHPDRL TLPETPfLS 

LKL5 1TGLLLG.VG1 LIGYFCLRRPf- i . 
C 5 • 0 D C ? * C w A s * S A I f a D A v r c • • 

CGACCTCATCCCGAGGTGT7GCA^CTTTATCGTTATTTCTGGCA^CCTGCTCGTTACGC T C-TACCGGAA7GCCTGGAT.-.- 

?PI " 1040 

GCTGGACTAGCGC7CCACAACGTTGAAATAGCAATAAAGACCG7TGGACGAGCA.i.TCCGACATGGCCT7ACCCACCTAT- 

ft PCPEVLOLrR VrwOP AR> A . PCWLOl - 
.PL: PRCCMf!v:scwLLV»I . R m r, w ■ < 
7 - 5RGVA7LSLFLA7CSI. h C T C H A G • " • 
To tr.sttuon P11D10 
0 

CC7GGGCT77CA7C77CAAACTGC7GGCG77A7GGCGA7CGGCCCGAG7TGGA7CG7CT7C77GACAGAGCG77AAA7AG 

" " - 1120 

CGACCCC-^-.AGTAGAAG777GACGACCCCA/*TACCGC7AGCCGGGC7CAACC7AGCAGA.-GA : >.CTG7C7CGCAAT7TATC 

LGrhLOTAGvr-AIGPSflvrLTEP - ' :> 
W A • : TKLLALWRSARVCSS;* c S V K • 

ACl«5SNCHRiG0ftPCL0RL.DPALhP 
AC7^GAGCAAGC7C7G77A77CCAGCCTS7TrAAA7GACAGGCAAAAACGGCAGGT7CG-C-7GCGCCGCGTATATCG: 

- - .?oc 

7 GAT 7 7 7 7 C 7 7 CO AG AC A>\7 AAG C 7C GG ACA A"KT 7 7 AC7 G7 C CG 777 7 7 GC CG7 CC - v* GC - 1- r.-.C GCGGC GC ^ 7 A 7 AO 
l - L T 0 P v * !1 7 G K n G *■ ft v ► : 

: k - - l c y s £ l r ► • o a k t a g r- : : a a > : 
i p 5 s v i p a •: l n o R o k k o - - f f « : ; 

CATTTGCC77TGGGCTGGGAT7A7TCAAACTCAGCTCTAGTGACTATTTTATGCTACCAGAG7ATCGCCAA7TGCTTCT; 

i?C: - - !?90 

G7AA.tCGGAAtCCCGArCC7AA7AAGT77GAG7CCACA7CACTGA7AAAA7ACGA7GG7:::A7AGCCC77AACGAAGA- 



scart i c £ "* 

»■ L f ■ t C w c y S m 3 0 v V 7 ! L C > " o m r " 

I : l w ^ G S ! 0 7 0 V * ' L r Y A 7 - & A S A S * 

r '- r g l c i i k l f c s o t r ,**',«••"♦? o l. • - 

CAC7G0777AGCGAGGA7GAGA7C7GGCACC7ATA7GG77GCT7GGGGCAA^GAGA7CG:-\-AT7AC77C::tCGCAAC- 

i ;t . — . j q . 

G7CACCAAA7CCC7CC7AC7C7AGACCC7CGA7ATACCAACCAACCCCC777C7C7ACCC7 , '7^7GAAGGAGGCG77Ci 
3 G ;. - h i P <; .', y : m v G k G K I r I. r. , r l r ■ - 

v \ p c r< l a a : i. v g a k f w «, : * ; s a : 
".'*»! ' ■ : r. :***'.: : " ** !. Of"!-*" . : '•• q. 

GA7GCAACAAAC7GCA77GCAGA7CGC7ACC3CCA77C77AA7CGGGAAGCGCA7CACGA:C:GGG7777ACA7GCGC- :■ 

; 3l : K(, 

77ACr.7 7r,777GACG7;^CGTr7An:CA7GGCCGTAAr,AAT7AGCCC77CGCC7ACTr,T7irr.rCCA/\AA-;.-TACGCC; ' 

: • w " s g r. j- ^ • * - . 

D A T C I A !> r t P « ! • S Z> S A * - ; ' ■ y. - r 
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TTAXJTATTATTACCCCCTCCCCAGCr.TATACTTTGGCCCAACACTTCTCTTACCCAr»ATTATrTTCATCGACCATTTCCT 

»«*' - - li?0 

A^TCATAATAATCOCCCACr.CGTCGCATATCAAACCCCCTTCTCAACACAATCCCTCTAATACAyS.CTACCTCCTAAACCA 

LVLLPPfQKJ LMPKTSI. Tt ! ! I' K C H L L - 
i \ > f l * S v v rcRK*. Li. Pf- L S K 5 1 c r - 

Si ] TJ 5AAVTLACDrSVRD> L »l C A T A 

A7GAGTTTTACT7CAC7TCCTC7GACGGAAATTAACCATAAGCTACCCCCTCGAAATATTATTGACTCACAGTGGATAAC 

I S?l - ♦ 1600 

TACTCAAAATGAAGTGAAGGAGACTGCCTTTAATTGGTATTCGATGGGCGAGCTTTATAATA^CTCAGTGTCACCTATTG 

- VLLHTL' RKtTISVPLCl 1. L S H S G • M - 
C F i F 7 3 S D G w • p • A 7 R S K Y Y • V T V 0 M 
n5:TSLPLTClHNKLPARH|iCS a 0UIT- 

ATTACAATTAA.CTTTATTTGCGCAAGAGCA^AAGCTAAGAGAGTTTCACATGCTATO 
1601 * * ♦ • - ♦ 1660 

taatgttaattgaam-aaacgcgrrctcgttcttcgatt 

v n i. s lrksnklrefhmll' a p l 7 v r 
:t:^.-:carat$* csf7Cycelrlp-g- 

' L £ 7 L r A 0 I O 0 A K fl V S H A I V S S » V S K A - 

CTG^^Vv^TCATCCGAGACGCXTATCGTTATCAGCGTGAACAGAA 

«* 8i - - mo 

GACTTTT7TAGTAGGCTCTCCGGATAGCAATAGTCGCACTTGTCTTTCAACTCGTCGT7GTTCTTGATCGC.i.CGAACGCA 

L K S S £ 7 P I V ! S V K R K L S S «•■'»• " R C V - 

• Vkh^^^LSLSA* 7ES-AA7>7SvLA«- 

c - ::nDAYRYOREOKvroooe:-CLR 

AAA=A7ACGC7GGAAAAWGGAAG7GGAA7GGC7GGAACAGCA^ 

" 1840 

77777A7GCGACC7T7777ACCT7CACC77ACCGACC77G7CG7ACA7777G7AAA7G77C7GC7AC7777AG77AAAGC 

K! PUK<<WKMMGWMSH* Nl tKTMXlNFV- 
K * * G r. HC5CH-ACTACKT T7RK - K S I S 
K N 7 L £ KHEVCWLCOHVKMLODC^NrjFR - 

T7CATTGGTCGA7CACGCAGCGCA7CA7A77AAAAA7AG7A7AGAACACG77C7G77CGCC7GG77CGACC-ACAGTCGG 

19«J - - --- - - 1920 

A.AG7A-CCACC7AG7CCG7CGCG7AG7A7AA77777A7CA7A7CTTG7CCAAGACAACCGGACCA-GC7GG7TG7CAGCC 

'« 5 : T 0 * 1 ILK IV" NRTCWpr. STTJSH 
riGRS»SASY* K * VRTCSVOLVRTTVC- 

L - A M H I K N S 1 E 0 v L L AW" 0 O 0 S V • 

TAGAC^GTCTT ^TG7GCCA7CG7CTGGCACGCCAGGCCACGGCTA7GGCGGAAGAGGGAGCCC777^777GCG7^77CA7 

U5: . . 

?00C 

A7C7CTCACAA7ACACGGTAGCAGACCG7GCCC7CCGG7GCCGA7ACCGCC77C7CCC7CGCGA-ATAAACGC-.7AAGTA 
' * C A I V W M A « P R L W R h * f fc r ; £ - . 

R 0 ~ VPSSG7PGHGYGGRG5-Lf£ : 5 S • 

Z i vKCHRLAROATAHAECGAL i L - ; h 

CC7G\- J tA^GAGGCA7TGA7CCCAGAAACT777GGCAAGCGG7T7ACG7TGA7TATCGACCC7GG777C7C7C'*CCA7CA 

GGAC-7-77C7CCG7^C7ACGC7CT77GAAAACCG77CGCCAAA7GCAAC7.AA7AGC7CGGACCaAAGAG^CGGC7AGT 

L = h • C E K L L A S G L R • L 5 *_ '.' S L r I E . 

' - « 1 0 A p n f w o * v y v u < « a h r s ; « 
PLf r . AtMRCTrCKPfTL 1 I C P G f s o o • 

GCCTGAA:TTTCC7CAACACGATA7GCCG77GAA7777CACT77C7CG7CA777CAA.CGCG77AC7GAAA7GG7'-ACG7A 

™ ei ?l6f 

CCGACT"GAAAGGAGTTGTGC7ATACGGCAAC7TAAAAGTnAAAGAGCAG7AAAGTTCCGC AA7G e C7T7ACC.^7GCAT 

1 *OHDMPLNFnrL«'i " t * ' • k 

t • " L r. T : C H • » r T r s s r o c •• * r f. - . 

l : . .- 7 p y a v r . r r : ? r »; r k a l . \ K " . , 

A7GGTC/AGAT A\AAGAGG7AGCGA7GAATA77AAAA7TAA7GAGATAAAAATGACGCCCCC7ACAGCA777 * CCCCTf • 
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CCACGT T AT AC AGGAACAAGACCTT ATTTCCCCTTCAATCTTACCTCTCC ACCAGTTAC AGGAAACGACGGGGGC AGCGC 

♦ 7 ?j» - - /J.0 

CC7CC\;T^TCTCCTTCTTCTCCA^TAAACCCC^CTTACAATCCAGACCTCCTCAATG7CCTTTCCTCCCCCCCTCCCC 

Ov r C'OCVI SrSMLALOCLOCTTCAAL- 

f-.-SifJKKLrRl.QC' LSRSVRSRRCQR 
P G ACTR'CY FArNVSSPGVTGMOGGSA 

T CT A7C *C ACGATGGAAGAAATAGG AATGGCGCTGACTGGT AAACTGCGCG AAAATTAT AAATTCACTGATCC TGAGAAA 

2321 - 2*00 

AGATACTCTGCTACCTTCTTTATCCrTTACCGCCACTCACCATra 

Y E 7 M t C IGMALSGKLRCNV K TTOACK 
S K -i •» W K K • E W R • VVNCAKI INSLMLRN- 
L - :?GaNRNGAEW» TARKL' 1 H • C • £ T - 

CTCG^CGCAGACAGCAGGCTTTGCTGCGTTTC^TAAAACA^ 

2<0i --- aeo 

GACC TCC-rCTCTGTCCTCCGAAACGACGCAAACTATTTTGTTTATGTCCTCCTATTACCCCGTTGCAACGCAGGCGAATG 

I I * - j ? .-. L L R L I KQIQGONGATLRPLT - 
K S - S R L C C V • • NKYRRIMCQRCVRLP- 

G CT-CrAArOKTNTGC* WGWVASAT 

C C^C.-.C- -A ? - GT C AT CC TG ATTT AC^A 

- - 2b60 

GCTTCTCT'ATCACTAGGACTAAATGTCTTArGCATAGTTTAATAGCGAGMCGTTACCGCG^TGACGCCCGCCCAACA 

t L * i 0 ! DLQNAYQ1 1 ALAMA L TA-j". L 5 ■ 
« : * • V J L ! Y R M R 1 KLSLLQWRLLtaGC 
R * £ 5' PTtrvSNYRSCMGAVCfiSvv 

CAAAAA^GAAAAAACGCGATTTCCAATCCCM 

* * ?6<0 

GTTTTTTCTTTTTTGCGCTAAACCTTAGCGTTGACCTATGCAATGTCGCCTCCTCCCTACCCTTGJU^ 

K a <NROL0SQLOTLQRRRDGMLPrtV 
Q K R r N A I CNRNW! R YSGGGmGTCRT* f - 
K K i .•■ T R f A I ATGYVTAECGW£ LAV r S " L - 

TACTCGA-.rTTGGCGA^GTGGATACCCrrACGCTGTCCTCTCTGAAGCGTTTTATGC^CAGGCGATAGACPACCATGAAA 

?64! — * ?'2C 

ATGACCr7C.i>CCGC77CACCTATGGCATGCGACAGGAGACACTTCGCAAA^TACGTTGTCCGCTATCTCTTGCTACTTT 

r w fc, i tv w I PYAVLSEArrA7GP«JC»* N 

T G 7 W s CGYRTLSSLKRfKOC-" ' I* " 0 £ i- • 
LL - * r. v 0 T V R C P L • SVLCwi* - • T7m». 

7GCCC77.VT rCCAG7GGTTCAGACGCGTGGCAGACTGGCCGGATCGCTGTGAACGGGTCrGTVTTTTCC7AACAGCAGTA 
*Ti ♦ . 

ACGGG-.-7- G:GTCACCAAGTC7GCGCACCCTCTCACCGGCCTAGCGAC^CTTCCCCAGGCATAAAACGATTCTCGTCAT 

'- 1 I'tQTRGftLAGSL' T G f f t k S S j - 

p -:0wrnRVAOWPORCERV?» : I L ft A v * . 
^5 r :"£GSDAW0TGRlAVMG< vrC'CO- 

GCC777G->;:7TACCATATGCATCGAACCCTCGGAGCAAAGTCGTTTGGCCGCAGCATTAG7ACGTTTCCGTCCTTTCCT 

7801 

?8Bt 

CGG.AAACT70^TCG7A7ACCTAGCTTCGGAGCCTCCTTTCAGCWCCGGCCTCGTAATCA7GCAAACGCAGCAAACGA 

- * T iiMHRTLGAKSTGRSI J ? ! A s r A 

'• r^.i ! C ICPSCCSRLAAALV = lbrli 
P L « _ A 1 S N PRSKVVWPQH' r vcv v c C 

GT7AT7CCT7GCCCTTGAAAAAGAGTGCCAGCGTGAGGAGTGGATTTGCCAGTTGCCGCCTAArACAT7ACTGCCGCTA: 

?88! ? ,e( 

CAATAAGGA.ACCGGAACTTTTTCTCACCCTCGCACTCC7CACCTWCCGTCAACGGCGGAT7ATCTAATGACCGCGATG 

Vi V W f- • KRV PA* GVGLPvAA* "* I T A A T 
i. GLC^CCOPCCHICO LPPNTLLPLL 

». S L ' l. k . rSA5VPSCrAS'CKL ! h » r R Y 

96 TACTCC '' T " T ^^tttctgagcgctgccttttcagtgattggttccttgatagacttacccctatactttcttcatcgaag 

AT GACC TAT V*T\V>CACTCGCGACCGAAAJ\GTCACTAACCA/ , »CGAACTATCTGAATGGCGATATCAAAGAAGTACCTTC ^ ' ' 
T - L A f f ■ ■ L V A • . 7 r p. > ; r r , t Q . 
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ATCT TCAATCCGTT ACTCCAACAACTTC ATCCCCACTTT ATCCTCATACCCCAT^ACTC TT TTA.ACCACGAA.GATCAAC 0 

i\?< 

T v ACAACTTAGCCAATGAGGTTCTTGAACTACGCGTCAAATACGACTATCGCCTATTCAC/Vv\ATTGCTGCTTCTAGTTCC 

VOSVTpTT* C A V ¥ A 0 T R • !. «' • R 1/ p c T 
M f" N »*LLOOI.DAOrriLI PDNCf riDCDQ" 
CSIGYSNNLMR3LC* YPJTVLTTKlNV- 

TGAACAAATTCTCGAAACGCTTCGTGAAGTAAACATAAATCAC^TTTTATTCTGATACCTGGCTTTCAATATTTACCT^ 

3i21 * " " )20O 

ACTTGTTTAAGAGCTTTGCGAAGCACTTCATTTCTATTTAGTCCAAAATAAGACTATGGACCGAAAGTTATAAATCCATT 

TNSRNA5* SKDKSGriLIPGrOYLGK- 
COILCTLRCVKlNQVLr' y L A r K I • V - 
N K fSKRfVK* R 1RFY507WISIFR* 



ATTGGCTTTCTGGCTCATCATGAGGCGTCAGGATGGATTGGGATCT^ 

"* " — J?80 

T AAC CG AAAGACCGAGTAGTACT CCGCAGT CCTACCT AACCCT AGAGTAATGACTT GCA7T AT AAG7CG AAAAA7AAG T7 

L A F K L I M R R O 0 G L G S H Y * T • V ? A T V S ; • 
*LSGSS-GVRMDWDl1T£RNlQLriO 
IGfLAMHEASGWIG ISLLHV 1 F S f i. T N - 



IT'S I 



TTAGCAGGATTAGCTGAACGGCCTTTAGCAACCAATATGTTCTGGCGGCAAGGACAATATGAVVCTATCATAACGCTCGT 
AATCGTCCT.MTCGACTTCCCGGAAATCGTTCGTTATACAACACCGCCGTTCCTGTTATACTTTGATAGTATTGCCAGC; 
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S R ] S • T A T 5 N 0 Y VLAART! t: » HN^ft 

LAGLACR PLATHMrwRQGOl l T 1 : T V V - 

• 00* LNGL'OP!<TSGG>:OHHKLS - a s , ^ - 

ATTCTCTTATGTCAGATACTCAAGCAAACC^^CTTAGACGAAGAACTGCTTTTTAA^GCGTTGGCTAACTGo- \*CCCGC 
TAAGAGAA7ACAGTCTATGAGTTCCTTTGGMGMTCTGCTTC7^ 

1 LLcoi lkqtflole:llfkalanw-p a . 

rS VVRYSSKPS-ThNCFLKRWi.TGMHO- 

slmsd7oanllrrr7af*svg-l:tr 
agcgttccagggtattcctcaacga.ttatttttgttgcccgatcggcttgcaatgagttgttctccaccrctttccacct 

i,4i ~" *"* " Jb20 

7CGCAAGCTCCCATAAGGAGTTGCTAATAAAAACAACGCGCTACCCGAACGTTACTCAACAAGACGTGGAGAAAGGTCGA 

a r o c i porlfllrdglamccs?p: =ss- 

SSflVFLNDY TCCAMGLO* V v •. f L r ? A 

2 -i f G 5 S T J 1 r*. ARWACNELF ; 7 J ' 0 L - 

■::GCCCAGC7CTGG77ACGA7TACA7CA7CGACAAA7AAAATT7CXTGGAGTCGCAA7GCC77CA7GG77AG37GAGGCA 
3GCGGC7CGACACCAATGC7AA7GTAG7AGCTG777ATT77AAAGXACCTCAGCG77ACGGPAG7ACCAA7Cr^C7CrC" 
- i L W L R I H m R 0 1 KF?CVAH*SW. £ - 

1 r s s c y o y i i o k • k r ? c s o c v u - . , K t . 

* * A L V T I 7 S S T » K | SWSRNAFMVi • G S • 
G7CAGCGCGCAACAG7GGC7CAG7G7ATGCGCGGG7CGGCAGGA7A7GG77CTGGCGACGG7GT7A77 A*.TC jC7A7TGT 
7AC7CCCGCG77C7CACCGAG7CACA7ACCCGCCCAGCCC7CCTA7ACCAAGACCGC7GCCACAA7AAT7,:gCGATAACA 

scaiL ic\r£>' 
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G^7GA7GCTGTTACCcTTGCCGACC7GGATGGTTGATATCCTGATTACTATCAACC7TATGTTTTCAG » GA7 " r ~CCC 

J66 — " 

C7Ar7ACGACAA7GGoAACGGC7GGACCTACCAAC7A7AGGAC7AA7GATAGT7GGAA7ACAAAAr,7.-A:TA0GACCAG~ 
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TAA7YGC7A777ATC77AGTGACCC7C7CCA777ATCGG7A777CCG7C777A77AC77A77AC7ACA77/-7 - 7~G777 
~> 7i > 1 ♦ - - ■ Je4 .• 
A7TAACf.A7AAA7AGAA7CAC7GGGAGAGC7AAA7AGCCA7AAAGGCAGAAA7A^»7GAA7AA7GA7G7 

lAlfLSOFI. 0L5VFPSLLLIT7L: 9 L 

!. r • l v t l s : r n r r n l y v l t l h y : v : 
nc yls - ■ psariGisvriTvrri isr - 

TCAC7CACAA7CAGCACA7CACGGC7GG7AC7C77ACAACA7AATGCCGG7AA7A77G7GGA7GCTTTCCCT^AGT77CT 

- - ♦ mo 

AG7GAG7C7-AG7CG7G7AG7GCCGACCA7GACAA7G77GTATTACGGCCA77A7AACACC7ACGAAAGCCA77CAAAC* 

SLT ISTSRLVLLOHNAGN I VOAFGKFv 
M S 0 S A H H G W y C Y N I MPV 1 LWMLSVSLS- 
THHOHITAGTVTT'C^'YCGCFR'VC 

CG7AGGAWAiA7C7CACCG77GGG77GG7CG7A777ACCA7CA7TAC7A7CC7GCAA777A77G7CAT7ACAi^AGG7A 

- • ♦ - 400C 

GCA7CC7CC7T7AGAG7GGCAACCCAA.CCAGCA7AAATGGTAG7AATGATAGCACGT7A.AA7AACAG7AA7G7TTTCCfT 

VCC». LTVOLVvrt'l 17 1 V q f I V I T X -3 : - 
• CC:DPLGW5 VLPSLLSCNLL5L0 " V 
R R R .". £HRWVC(.!YHHYYRAIYCHY>'Ry_ 

7CGAGAGGC7«CCGGAAG77AGCGCACG777C7CGC77GA7GCCA7GCCAGGCAAACAA^TGAC7A7CGA7GGCGA77T:- 

- * --- joac 

AGCT C7 CCC.-.C CCCCT 7CAA7CGCG7GC AAACAGCGAACT ACCCTACGCTCCG7 7TGTT 7ACTCA7AGCTA.CCC IT AAA Z 

ER V-. E VSAfiFSLOCMPG .KOMS: 0 Z 0 l 
S R G W - K L A h V S it L H 6 C 0 A N K ■ V S H A : : 
R E G G j S * P. T f L - . • W D A R Q 7 hi C : R ■ W 3 r - - 

7n insertion P2D6 
U 

CG7CCCGGAG7 7 VTCG A7GCAGACCA7GCCCG7ACA77AAGACAGCA7G7CCAGCAGGAAAGCCGC777C7CGG7GCG A ~ 

< 0B 1 * ♦ * U60 

GCACGGCC7C.-.MAGC7ACC7C7CG7ACGGGCA7G7AA77C7GTCG7ACAGG7CG7CC77TCGGCGAAAGAGCCACGCTr 

RAGv ; 0 ADMARTLRQHVO0E SR f L G A K • 
VPJI L jmQTmPvh* D S M S S R K A A F S V R ;•: - 
CRS RCRPCPrlKTACPAGKPUSK CD 



4161 



GGACGG7GCG-7GAAA777C77AAAGGCGA7ACCA77GCCGG7A77A77G77G77C7GG7GAACAT7A7CGGCGG7A7C- 

CC7GCCACGCT = G777AVVCAA777CCGC7A7GC7AACGGCCA7AA7AACAACAAGACCAC77G7A^7AGCCGCCATAG- 

C- G A - -. • v K G P 7 ' I A G I I V v l > k ; i G- G I 
7 V r - l L V A ! K L P V L L L F « - * 5 - . > 

CFiC: : - • RPYfCRYYCCSGlH'i r r - 



77A7CCGT--.::3TACAA7A7GATATG7CGA7GAG7GAGGC7G77CACAC77A7AGCC7AC7GTCA^7CGG.-.GATGGT7-: 
«241 - „ ♦ 

AA7AGCG.-T-.G:ATC7TA7AC7A7ACAGC7AC7CAC7CCGACAAG7G7GAA7A7CGCATGACAC7TAGCC7CrACCAA>." 



I A : ; % \DMSMSEAVHTYSV!5|G'^Gl. 
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ACACCCC77 T AAGG7AGCGACGAC7 AAAGGGAA7CGCGCCCT7AA7AACAG7GGGCACAGGGCCCAC7C777GCGG7C77 

cgo; s l. : slsagi i v t k y p c e ► - ? . 
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GGACCGC7CT TTC A/\CTCAAGAG777AACGG7C7G7TGGAG7CAGCGAG7A7AA77GGCGACACCAAAA77AGGAGGAGC 

l t : . : a p o f 0 5:. : l7av. l - l : - 

* * V " ' i. * P 0 K L 5 K £ * • f L w ; ; s c 

PCI*' E •' T« . v 7 T r- V A »- I t.- f t : f f. : r i . 



Figure 1 1 



52 



EP 0 889 120 A1 



4 <e i _^™_ nT .7 CCC ™^^ 
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4 88! 

CAATGGC77CCCC7ACCCCG7CTACACAACCGAA7G7777C7G7ACCGGG7^ 060 
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TTAGCCACT77GAAACG77GCCGAACA7AG7C7CCCCCAAAGATAATCTC7AAA^ 

1 A f L 0 R L V S E H V S I R 0 L R • r - 

L L H ! T \ r 3 " 



^ L K L C K G L » O S C F L L T. } ~- * - - ■ ^ : [> 

N " * K r a t a c ; k r* g r v • y r 
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MCTGf^CCCCACGTC^JsAAACATCTCCTGATCTTCACACAATATGTCCCTATCCCCCTTCr.TCCTCATATTCTGCGTCCT 
TGACCCCCCGTGCACTTTTTCTACAGGACTACAACTGTCTTATACAGGCATAGCGCGAAGCAGCAGTATAAGACGCAGCA 



^ W A f ft c K 0 V L M I. T r , v ft , A L r a H , fc R 

T c r m v »• k h s • - : Q N K s v s K f V V | r " V 

'"CAT* tfPCPOVOBICPYRASSSYS AS S- 
CTTAATCCGGA^GGAAAACCGCTGCCGATTTTGCGGATCGGCGAAGGTATTGAAAACCTCGTGCGTGAATCCATTCGCCA 



b?Bl . „ ____ _ 

GAATTAGGCCTTCCTTTTGGCtjACGGCTAAAACGCCTAGCCGCTTCCATAACTTTTGGAGCACGCACTT^ 

LNpCCK PLPILftIG£ ( GieiJLVRes^RO 
LI RKCNRCR rCCSAKVLKTSCVMp"r" 6 R 
SGRKTAADFADBRHY- K p R a * I H ' 5 "p 

SHI G * CCGCAATGGGGACCTATACTGCGCT ^^ 

CTGCCCTTACCCCTGGATATCACGCGAC^^ * * * ° 

TAMCTYTALSSftHKTOI l> 0 L ] £ Q A : K0 _ 

» 0- w g pilrcrlviprrscnlssrr'. c . 

DGNGDLYCAVVSS - D A D P A T Y R A G .» A. . 

S 4 4 1 AGTCAGCCAAATT A ^^" GTCAC ^^TCGACACCC<^CGTTTCTTGCGAAAAATT AC AGAAGCCACCTT 37TCC AC 

TCAGTCGGTTTAATAAGTAACAGTGAAGACAGCTGTGGGCTGCAAAGAACGCTTTTTAATGTCTTCGGTG^ 

5 A f L r I V T 5 ' V £ T R R f I R K ■ T r A T - ~ 

S 0 P rJ 1 ' S i- S L L S T P D V S C F. K L Q ~K P p ".- S 
VS CIIHCKrCR«P?FLAK N YRSHL \ R "a . 

" ACCCATT ™ GTCATGGCAGG ^^ 

CATCGCTAAA^CAGTACCGTCCTTAATCCTCTCCTCTCGGAATATGTTCACCATCTTTCATAACT 

VPI LSw OELGCCSLIOVVtS I OLSE-r 
YR fCHGRN - CRRALYKW* K V L T L A \ % ~e 
TDrVMAClRRGEPYTSGRKY* P * R A G " - 

5>601 G _7 G _^ 

CAACCGCCTCTTACTTCTTACTTAACTACGTTGCAGACTCCGACTTTATAGGCGGGGGGCT 

end icrD* stare yscN*? 

L A r- { , r r . I C A T 3 E A £ I S A P R W L L S - C - 
K ' T M K * C 1 K P « 1 R i. K y P P p o G Y C a >; c 

" " - * * R " * * C N V ■ G - K I R r P M V ] V f , - 1 
CC ^ T7CAGGATGTCAGCGCAACGTTGtTAAATG 

gctta^G7cc?acagtcgcgttgcaacaatttacgcaccaacggaccccataaataccccctcaacacgacatI--tccg ' 



,; S i': Z 0 R M V V K C 



- , -- - „ W G ! 7 G R v v L : t, 

fi : 0 D v S A T L L n i w L P G v r M G =1 t C - « ■ 
l TK". SAORC* M ^ f, r L c Y t. w A S C A "v < 



^ tcgagaagaacttgctgaagtcgtgggcattaatggcagcaaagctttgctatctccttttacgagtaca^tcgggcttc 

ACCTCTTCTTGAACGACTTCAGCACCTCTMTTACCGTCCTTTCGAAACGATAGAGGAA^ ^ 



J'SCA' ' ' 

* 1 ' ' T f " S R G L> • w r» ^ S r £ I S r , r r K ; . s 
C T. C l A r v v G I m 5 i- A L !. 5 P r T S - I C " ^ 

- K - L ». S K G L K a a K I C * L L L R V 0 5 Z % ! 

be * ■ * CTGCGGGCAGC ^ GT ^ TCCC ^ TAA ^^ 

TGACGCCCGTCG7TCAC7ACCGGAATTCGCTGCGCTAGTCCAACGGCACCCCCTTCGCAATAA7CCCGCTC ^ ? 



'- : C* V M A I 5 D » : r f p w A K R 



* C L f R 0 ; r v G F. A |. », 
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CCTTTCGTCCTCCCCTTCATCCCCCCCAACTCCCCGACGTCTGCTCCAAAGACTATGATCC^TCCCTCCTCCCf.C^\7' 

t 9 7 1 , 

. » ♦ . 60OO 

CGAAACCAGCAGCGCAACTACCGGCGCTTGACGGGCTGCAGACGACCTTTCTGATACTACCTTACGGAGGAGGGCGTTAr 

rc ft PlOGRtLPOVCWKDyDAMppp ( :f- 
A| - vv PLMAANCPTSAGK TMHQCLLrG'W. 
L w S 3 P • W P R T A R R LLEfl L CNASSRng-' 

CTTCGACAXXCTATCACTCAACCATTAATGACGGGGATTCGCGCTATTCATAGCG7TGCGACCTGTGGCGAAGGGCAACG 
6001 , , „ . 4 4 

CAACCTGTCGGATAGTGAGTTGGTAATTACTGCCCCTAAGCGCGATAACTATCGCAACGCTGGACACCGCTTCCCGTTGt 

V R 0 P 1 TQPLMTGJ R A 1 0SVATCCCG9R . 
«""3LSLNH* • HGrALLIALRPVAKGlJr. 
5TAYHSTJN0CD5RT* ■ RCOLWR« i T ~_ 

609 1 AG7G ^ TATTTTTTCTCCTCCTGGCGTGGGG ^^ 

TCACCCAT/^AAAAAGACt^GACCTSCAC^^ " 6 ° 

vc : "SAPGVGK ST LLAMLCWAPOACSt:- 
Wvr rLLLAWGKARrWRCCVMR0TOTA 
SCY rrCSWRGEKHASGOAV - CARRR*Q 0 . 

ATGTTCTGGTGTTAATTGGTGAACGTGGACG*GAAGTCCG^GAAT^ 
5 ie) „ . , , 

TACA^ACCACAATTAACCACTTGCACCTGCTCTTCAGGCGCTTAAGTAGCTAAAATGTGACACACTTCTCTGG3r?TT T 

VLt -' '-ICERGREv'R£riOrTLSE:£TSf 
* T W c • LVNVDCK SANSS! L H C L K R p r . 
C S G V M W • TWT R S PR I HR f r T V - R D P \ 7 - 

624 1 CC ^ GTGT ~~ TTG ^ GTCGCAACCTCTG ^^ 

gcaacacagt.v\caacagcgttggagactgtctggck:ggaatctcgcgcactcccgcgacaaac^ ° J * ° 

R C V ! V V A T S D R PALE.RVRAL F V A T T 1 ; - 
Vv S i-LSQPLTD?P* S A • GRCLWPPR*- -. 

lchccrml* Qtrlrarcgavcghhds V - 

AGAA7TTT77CGCGATAATGGAAAGCGAGTCGTCTTGCTTGCCGACTCACTGACGCGTTATGCCAGGGCCGCACGGAAA7 
6371 — »_ , • _ - - - 

TCT7AAAAA-.GCGCTAfTACCTTTCCCTCAGCAGAACGAACGGCTGAGTGACTGCGCAATACGG7CCCGGOT ' " 

Err ~ 0NGKRVVLLA0SL7P. Y A R A A R n c - 
N r ■ 1 I M E S E S 5CLPTH* RVMFGPHG*. 
Rlf 5 = * MKASRLACRLTOALCOGRTE! 

6<0I CCCTCTGCCCr:GGAGAGACCCCGCTTT ^ 

CCGACACCGCCGCC7r7CTGGCGCCAAAGACC7rTTATAGCGGTCCGCATAAATCACGTAACGG7CCTGAAAAr7"773C- 

Z - K F ; = 0 R G F W R 1 S P G V r S A L P ft L L • ^ 
R s G * t T A v S G E V P 0 A V L v h C K 0 T • "*. i- 

A L ^ F I R p R F I. L N I A R R I • c I A T T F R 7 1 . 

ACGCGAATGGGACAAA.V,GGCAGTATTACCGCA7TTTA7ACGGTACTGC7GGAAGGCGATGATATGAA7GAAGCCG77G' 
6461 - ----- w . 

TGCCC77ACCC7rT777TCCGTCATAATGGCGT^^AATA7GCCATGACCACCTTCCCCTAC7A7ACTTACTTCCCCA^CC 

yscN • 

1 C K c ; " S S ; T A r T 7 v L V l G C 0 w r. £ a v g 
B t w £ * > A V L T H T I RYWWKAM? • m o t t 

CNG3sscrvRiLyCTCCRR - s R J] 

CGGATGAAC7CCC7TCAC7GCTTGATGGACA7ATTGTACTATCCCGACGGC7TGCAGAGAGGGGGCAT7ATCC7GCCA7- 
bbo I . . . m 9 

CCC7ACTTCAGGCA^G7GACGAACTACCTGTATA>.CATGA7AGGGCTGCCGAACCTCTCTCV:CCC7AA7AGGACGG7Ai 

C * S " - 7 l • w 7 Y 7 T i ? T A C R E G » L S C " • 
C E v ? L L 0 G H 1 \ L 5 R P L A C K f. « 1 \- Z ' 

H M r " s m 0 : L : r P D Tf L 0 R C ' : l ' • 
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^CCTCTTCCCAACCCTCAGCCCCCTTTTTCCAGTCCTTACCACCCATCACCATCCTCAACTCCCCtiCCATATTC.'T*- - 
664] — . t m ~" 

ctccacaacccttcccactccccccaaaaacctcaccaatcctccctactcgtagcactti;accccccct atmccctcc 6 T " 

RV C NAOPRrSSRTOP-ASSTCCDIf T 



DVLATLSRVrP 



VVTSHCMRQI. A A I I R c 



TCWOR5AArr0SLP AMSIV N WRftYCO.3 

TATGTCTATCACTGT 



CACGCACCCCGAAATGGTCCTCCAACTTGACAATTATCXrCTAACCCCTTATCCTCGCTCCTCAACTA~ " " " 



VPCA ^PGG-TVN T H«ciPARS-rR v. r 

CLALV 0£VtLLIRIGCYORGVDTD TD'K 
A w R FT R R INC* y A LGNT SC E L I Q ! L 7 

0 ] ^_ C ^ TTGATA ^ CTATCCMAT AT^^ 

TTCGCTAACTATGGATAGGCCTATAAACGTGTAAAAACGCTGTTTCATTCCTACTTCATACCCCT 



y > - 



S « • V L S C Y L H I r A T K ■ G • S M R T R A T 
* 1 CT1r POICTrLROSKOEVCGPCL' • 
K P L 1 P I R I f A H r C D K V R „ K Y A 0 P 5 Y \ *- ! 

6861 _ < ^^"^^^ 

CTTTTTAATGTGGTTTATGAGTGGCTCACTACTACCTTTGAAACGACCTCTATTAGCGCG^ ' ? * " 

end yscW* vscO- 

r " * P * T H B V 1 " E T L t E I 1 A R L K S N v A . 

c k l h o ; :, i r : s w k l c w b . s r g - *. a . T % 

""YTKJSPS P H • ti r AG DNR AA C k Q LR j- 

cgttcgaatcgcatga^ctagtcgtctjttgtccgctaataatgccttgVcgtct ' 04 ° 

A ^ S . L?YLISSMRRLLRNSRrAR R-L-oc. 
0 A y R T * saatgoy y G tad l p g a r f «• s 

K L T V L 0 0 O 0 0 A I J T C 0 0 I C 0 T R A L A V ^ : 
704 I 7™^_ A .^^^ 

a^atgctctcactttcttaattacccgaccgttccatgcaata^ ' 1 

L v P n D _ * K M * w A G K v « ' i- * i r c w i „ „ f) K ,. . . 

m > 0 i £ P 1 H G I A R > v I t S r ! V C ■ r T 7 H r ' 
* T R 1 K £ 1 M G « 0 C T L £ C H L L L D ~ >: y g \ * s ] 

» i : ■ .^/J^ 

gcccaataagtgagtccgcgtctcgaaa^ctgcgttgccgttcctcaatctcttactcatag : : ' 

C Y S L R K R A f . R N C K q t c N 0 
*~ V . », 5 ° A C L r 0 A T A S c . R , s , s \ L % R 

M *■ r oa05FLT C )R oavpesvgaacl'p^. 

7?0J "^.^ 

^ GG " AATG T CT 7CTT AAAAT T ACGCGAATACT TT T TC TTTCT TT T T T AAT G ATACC AT AAT TC GCT ACGCAT AA 7 ?6 ' 

c . . , end vsc0 ' start yscP. 

' A M S R R " L « R L ' K R K K K L L » * * ? j ' - 

i ?e i ^^"^ 

GGT7TCAA"CCCTTCAGA*CCCAACGGTACCGTCAGAATA^^ 

0 * " - 5 V A p P V L S C ■ • F. G C O T » " j 

v ^_ - ; L G L V C 0 5 J C D 0 u I A £ A r h K \ . 

1 ■ " «■ v ^ ^ c c h » s : • r k : : R R N p „ v K % L , 
Figure 11 
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GAACAACTCATCCACCAGCCATTACCCATTCCTCAGAATAATCCTCCTCCACCATTGAATAAGAACCTCGTrrTCACCCA 

I * • ' ..... ...... ♦ . } 4 t r, 

CTTG TTCAGTACG TGG TCCG T AATGGG T AACC ACT CTT AT TAGG AGO ACC T CGTAACTTATTCTTG CACC AAAAGTGCG T 

T T H f. P C 1 7 H w ♦ C - SSCS.E' E M G T h f 
tOLMHOALPICENN PPAALNKNVVfTO- 
NWSCTftHYPLVRI I LLQH* I R T W T S R w - 

^ AC CTTATCGTGTTAGTGGCGGT7ATCTTGACGGTGTAGAGTGTGAAGTATGTGAATCACGGGGGCTAATCCAGT^ 

7GCAATAGCACAATCACCGCCAATAGAACTGCCACATCTCACACTTCATACACTTAGTCCCCCCGATT ' " ° 

tL£ C'WRLS - R C R V • S M ♦ IRGANPvkii- 

RYRVSGGTLDGVECtVCCSGGLlOLR I - 
VlVLVAVlLTV-SVKYVNQCC'SS-t 

tcaatgtccctcatcatgaaatttaccgttcgatgaaagc^ 

7521 * * * * • ♦ 7600 

AGTTACAGGGAGTACTACTTTAAATGGCAAGCTACTTTC^ 

0 C P S S • W L F TOES AKAVA .GVSVAA V fl / - 

MV PKHClTPSHKALKOMLCSOLLH.fC - 
SM SLlMKrTVR - KR - S5GN5L5CC-iWG- 

TATATAATTTCtCTGGAGATATTCTATGTTAAGAATAGCGAATGAAG 

•*oi -- _ ___ , bd , 

ATATATTAA^GCACCTCTATAAGATACAATTCTTATCGCTTACTTCTCGCAGGCACCCACCTCTATGAACGTTGCCTTC 

end yscP* start yscQI 

VMr '«DILC'E*RMKSvRGWflrFO?iK 
Y 1 i 3 L E I r v ^ K N <: r . RASVGGDTSNA R- 

1 * r p w r v s r, l F l ft m eespwve Ilptoc . 

7681 GCGCTAC WTT ^ TGArcTGA ^^^TATGCAACAATATCCAGTACAGC AAGGGACA^ 

6 I ~ * * ♦ ♦ . ""60 

CGCG ATGGT AACCACTCGACTGT AACTC AT ACGTTGTTATAG GTCAT GTCG TTCCCTGT AATAAATGGTATTT AATAG TA 

start yscQI 

A 1- F L V S • H • VCNNIOYSKGHYLP* I j » . 
RYHK * ADIlYATISSTARD! JYHKL S -- 
A 7 i C E L T L 5 M O O Y » VQOCTLFT ] M 'r"*i - 

7 76 J AA7GACCTGGGTACCGTGTGGATT ^ 

TTnCTCCACCCATCCCACACCTAACGTCTTGTTACGACCGTCGCGACCACACTTCCCGATT AACCGTGGCGATT-CCTA*. 8 ' ' 

H s l; '■■ "> C C :. 0 " m A G S A 6 v k c • i. A P L ! 0 = 
*» * ' iVOCKTMLAALV- R A M W K R ""• *•-""• 
Mt LGPvwi AEOCWORWT -: GL l C T A n ~? * ? 

18 4 1 GGCTATCGATCCTGAATTGCTATATCCAATACCTCW 

CCGATAGCTAGGACTTAACGATATACCTTATCGACTTACCCCCGACCGCGGCAATAACGTTCGGTCACTACGTTGCG^G^ 



• c k r v m o f :• 



i- S LNCYME* LMGGWRi 

G > F < • IAIWNS-MGAGA \ I A S 'e ' C N 

A T 



A! D » ^LLYGIAEWGLAr iLU A SO 



7 9? J CTCAC/ ^ CGAGCCCCCA * CA7CC7CC ^ 

CAGTCTTCCTCGCCCCTTGTAGGACGTCATTAGATCGTGTAGTCGATCGCA-ACCTATAATTTACCTGTCAArTTCTCCT -". 

v R T S - 0 H P A V I t M I 5 • R ; L I. G 0 L r- s m 

S E h - -. N ! L 0 STTSASvt »• m 0 s . R " . . 

0 * I r P T S C S w L P H 0 L A I i, J k N 7 v E E m 

800 J GACTTCCATiA " CATTAT ^ 

CTCAASGT^TCCTAATAAAAATCTACCCCTTGCCC^^ 

s s i : r l ii g o p v r c a i - si; r l l s r. », i 



v p ' - ^ r v n a n g r r o y y k k * r 



c r m rTwpTCt' lrn 
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eoei r,ATTTATCCTCCCCCTCCTGTCCTACTCCCTC ™^ 

CT AAAT AGG ACGGGGAGG AC ACC AT C AGGG AC AT AT AAGTCCCACCACGGTCGAATGT AATT AGCTTG AACTC AG ATAGC $ ' " 

r ' J-PILW' J L . I 0 A G A 5 L H * S n L S | S 
nL S CPSCCSPCirRLVPAYlNftT - v,r_ . 
J IfPAPPVv vpVTfSGWCOLTI. ! E L C S I £ - 

8 1 6 1 ^ TC ^ ATGTCCCTTC ^ ATTCATTCCTTCM ^ 

TTTAGCCGT ACCCGCAAGCCTAAGTAACGAAGCCCCTGT ACTCTGAGCCAAAAAAACC^TAACTTGATWACCCCCTTAG 

K SAW A rGriAS ATSDSVrLlFNYLGCS- 
N R HG R S D S L LP- R H OTR FFCY S T 7 W c N t- 

iGMGVRjHCrGOIRLGrFAIOLPGGl 
82 4 1 TACCCA ^ CCTCTTCCTCACAGAGCATA ' AC AC ^ TG ^TTTGACGAATTACTCCAGGATATCGAAACGCTACTTGCGTC 
ATGCGTTCCCACAACGACTGTCTCCTATTCTGC7ACTTTAAACTGCTTAATCAGCT 

TQGCC - Q R I T R • N L T W • SR! S K R Y L R Q - 
aKCvA OftG-HOEI*RISPGYRNAT -V 

Y AKv lltcon?mkfoe:uvodi e t l l \ s - 



9 3^! A ^ AGCCCAATGTCAMCAGTCACK ^^ 

TCCCTCGCCTTACAGTTTCTCAC7GCCTTGCAGAACTC*^ ° * ^ * 

SAOCORVTERLOSMLSRyNNftCSLRS 
R E r N v K £ * R n V r S P T • A 0 T T T G A L • G R - 
C * PXSKSDGTSSVCLEOI P 9 c* V L F : 7 G - 

8 4 0 J GACGTGCGACTCTGGAAATTCGA ^^ ACC ^ 

CTGCACGCTCAGACCTTTAACCTGTTA^TCCTGTTGAATTTTGCCCCCTGCAAAACGGACATCC^CCTACAAV^CGCGGT 8 ^ ^ 

DVRVKKLDNYD KLKRGTrCL- VDVLRO- 
TCESGNKTITTT - M G G R F.AC R W m F C A R - 

RAS LC!COLRO'-KTCOVLPVGGCFAP 

e 4 a CACC7CACCATAACAGTAAAT ^ 

CTCCAC7GCTATTCTCATTTACTCGCATAATA^CCGTTCCACTCA^^ 

* ' K ' L * M T\'LiCKVS-LP VAMNLKr VL . 
G 0 D S K • f v V W A R • v 0 C L W 0 * ! i i A * 

' V T I R v n o R ; ; : ; 0 C C L I A C G H L f r. . : \ . 

fl5 . TACACC77C GT ATCTTTGCAAAAATACAGCGTiAf CCTGATAAGAAAAATAATATCCG=ACAATATAATAGCGTTCrAGG 

, ATGTGGA/iCC^TAGAAACGTTTTTATCTCGCATTTGGACTATTCTTTTTATTATACCCTTGTTATATTATCGCAACCTCC 

end y^scC* 

* V G I r A K 1 Q B .►: f 0 K K N N M K T f $ R 

r 7 L * S L 0 K V S V f. L I R K I I C £ C : V W < V r G . 
T R K • - : t: K T * ' T • I K • V A N N 1 1 A f 0 v . 

TCGTGTCA7GAGAGATACAGTATGTCTTTACCCGATTCCCCTTTGCAACTGATTGCTATATTCTTTCTGCTTTCAATACT 

8641 " " o'.N 

AGCACAG7ACTCTCTATGTCATACAGAAA7GGG:TAAGCGGAA^CGTTGAC7AACCATATAACAV^GACGAAAGTTATGA 

scare yscR**: 

5 c m t * y s e . r » g ; ? « l o l i -:■ ; :. «■ : L 5 : l ■ 
* v - w M P T V c L v p : v l C N • i. v \ c J c r c • c - 
v 5 ' £ I V t V F T P r A r A T 0 K \ i V S A r K 7 

6 ( ^^^^C7CATTA7CGTCATGGGAACT7CT77CrTT ^^ACT GGCGGTGG7ATTT7;;CATT7TACGA : \ATGC7CTGGGT*T7C 

CGGAGAGTAA7ACCAC7ACCCTTGAAGAAAGGAn7T7GACCGCCACCATAAAAGCTAAA.ATGC77TACGACACCCATAAG 



' - : v k c 7 s ; •.. ■ : /• v v r £ : i. h ». a l g • 
L -'• ? w «_ i. l l •. w w w y r 6 r : r m L K v 

2 '* ' 1 " c n r r - ■ • g g o i r : r t > : ; c ^ 
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AAC7V.uT<;CLXCCAAATATCCCACTCTATCCCCTTCCCCTTCTACTTTCCTTATTCATTATCCCCCCCACCCTATTACCT 
TTCTTCAGCCCCCTTTATACCCTCACATACCCCAACCCCAACATGAAACCAATAACTAATACCCCCCCTCCCATAATCC/. 



OvppiJi* L yciALVLSLriMGPTlL; 
h S ? Q \ SHCmALRL T F P Y 5 L w G R R Y ' l- 
~ S P > KYRTVHPCACTfLI HTGADAlSC- 



GTAAAACACC C CT CCC ATCCGCTTCACCTCGCTCCCCCTCCTTT CTG GACC TCTGAGTCGC AC ACT AAAGCAT TACCCCC 
CATTTTCTCGCGACCGTAGGCCAAGTCCAC«CGACCGCGAGGAAAGACCT 



V K £ R K h PVQVAGAPFWTSEWOSKALAP- 
' KS A GIRfRSLALLSGRLSGTVKH' R I - 

KRALASCSCRNRSTLOV* V G Q * S I S A 

TTATCGACAGTTTTTCCAAA/AAACTCTGAAGAG^ 

6961 — 9o.o 

AATAGCTGTC-AAAAC C I I I [ H I 

Ya Ori.OKNSECKEANrrRKLlKRTWc E . 
iOSr CKKTUKRRKPI 1 F G I • - K E P C L 
L 5 T V r A K R L ' REGSQLFSE F 0 K T M L * * 

AAGACATAAA-AG;^GATAAAACCTGATTCTra 
904 1 ♦ , . . t # 

TTCTGrATT7T?~?CTATTTTGGACTA^ 

D i KPK tKPOS.LLl LI PAFTVCQ LTQi 
* T • K I S ♦ M L I L C S T • FRHLR - v S • R R ' H 
R H K K K 0 K T • F F A H 1«SG I YGESVNAG1> 

TTTCGGATTGGATTACTTATTTATCTTCCCTTTCTC^ 

9,n - 1*00 

AA^GCCTAACCT.-ATGAATAAATAGAACHKJAAAGACCGATAACTGGACGAATAAAGTTTATATGACGACCGATACCCCTA 
? * * G - I. I YLPTLA1 DLL1 SMI L L A M G M - 

r s i e- r l r i fpfulltclfoi y c w l w c . . 

5 0WirYLSSLSGY*PAYFKYTAGYGD 

gatgatgctgtcgccgatgaccatttcattaccgtttaagctgctaatatttttactggcaggcgcttgggatctg^cac 

9?01 — • ,» ao 

ctactaccacagcccctactggtaaagtaatccc^ttcgacgattataaaaatgaccctccgccaaccctagactgtc 

M " V S ? M ? I SLPTKLLI rLLAGGWDL-L- 

w c^?»PFHYRLSC'YrywoAVG:'H. 
d d g v a •:• n h r i t v • a a n r ftgrrlgs:t 

^ TG ^CCA\7TGa:ACArjVGCTTTTCATGAATGATTCTGAATTGACGCAATTTGTAACGCAAC7?TTATGGATC3TCrTT: 
ACCGCGTTAACr^TGTCTCGAAAAGTACTTACTA-.CACTTAACTGCGTTAAACATTCCGTTGAAAATACCTAGCAGGAA 1 

end yscR* scare yscS' 

A C L V ; 5 F S : MlLN'RML'RNf^GSS: 
WRNW VKAFH£* F * IOAlCNATFMDRc' r . 

c A 1 G T C «- P tt P P £ & L T0rvTOLLW|v*Lr- 
9 36 TTACCTCTATCCC3GTACTCTTGCTCGCATC ^^ 

AATGCAGATACCGCCATCACAACCACCGTAGCC.-.TC.AACCACAGTAGCATTCGGAACAAGTCCGGAACTGAGTTTATGTC ^ ^ 

^ " L C ' •: W W H R • L V ' S S A. L F R F * J. K ^ 

V ^* 1 AG S V G G I G i K C M R k n C S G L D S K - g 
T S M p v v L V A S V V G V 1 V S L V 0 A 1 T Q ; ° 

GACCAAACGCTACAGTTCATGATTAAATTATTGGCAATTGCAATAACCTTAATGGTCAGCTACCCATCCCTTACCGCTAT 

" " 4j»f, 

CTGGTTTGCGATC T C AAGT AC TAAT T TAAT AACCGTT AACGT T ATTGGAAT T ACCAGTC GATGGGT ACCGAATCC CC A T f- 

T * R T S f. • K l « 0 L 0 ' P'WSftTNGLAvv. 

^ n A T v h n ' !!GKCMnlhGOLPha-P« 
DC' L0rM|KLLAIA!TLHVSYPKL3G : 

Figure 11 
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CCTCTTCAATTATACCCCCCAGATAATGTTACCMTTGCACAGCATGGTTCAATGCCACAACAGCTAAATGACTCGCTTA 

9800 

CGACAACTT^TATCCCCCCTCrATTACAATCCTTAACCTCTCCTACCAACTTACCGTCTTGTCCATTTACTCACCCAAT 

end yscS* scare yscT- 

C * I 1 PGR" Crf.LESMVF.WHNR- M S G L 
PVtL VPAOHVTWWftAWLNGTTGK* V A Y - 
LLNYTRQIMLft ! Q !*■ H * M A P 0 V f E H L I - 

TTGCATTGGCTGTGGCTTTTATTCGACCATTGAGCCTTTCTTTATTACTTCCCTTA 

AACGTAACCGACACCGAAAATAAGCTGGTAACTCGGAAACAAATAATCAACGGAATAATTTTTCACrc 

LHWLWLirOH'AFLYYFPY* X V A V * Q P - 
C I GCGfYST ! EPFFIT5L I KKWQFRGR- 
ALAVAFI RPLSLSLLLPLLKSGSLGA 

GCACrrrTACGTMTGGCGTGCTTATCTC^CTTACCTTTCCGATATTACCAATCATTTACrA 

96BI — - ♦ - 9160 

CGTGAAAATGCATTACCGCACGAATACAGTGAATGGAAAGGCTATAATGGTTAGTAAATGGTCCTCTTCTAATACTACGT 

HFYVMACLCHLPFRYYQSrTSRRL* CI- 
TFT* WRAYVTYLSDITNHLPACDYDA 
ALLRMG VLMSLTFPlLPIlYOOKlMMM- 
Tn insertion P9EP 
il 



9761 



TATTGGTAAAGATTACAGTTCGTTACGGTTAGTCACTGGAGAGGTCATTATTGGTTTTTCAATTGGGT7TTGTGCGGCGG 
ATAACCATTTCTAATGTCAACCAATCCCAATCAGTGACCTCTCCACTAATAACCAAAAAGTTAACCCAA 



98<0 



tVK«TVC'C-SLER-ULvrOLCrVRR 
V W • RLQLVRVSHWRGDYKrFNWVLCGG- 
IGKCYSWLCLVTGCVI ICFS1GFCAAV- 



9841 



TTCCCTTTTGGCCCGTTGATATGGCGGCGTTTCTGCTTGATACTTTACGTGGCGCGACAATGGGTACGATATTCAATTCT 
AAGGGAAAACCCGGC AACTATAC CG CCCCAAAGACGAAC TAT GAAAT GC ACCGCGCTGT TACCCAT GCT ATAAGTTAAGA 



99ZO 



rpFGPLIWRGFCLI LYVARQWVRYSIL- 
S L L G R • YGGVSA* Y TTWROMGY DI Q F Y - 
PFW AVOMAGrLLDTLRGATMGTirNS 

ACAATAGAAGCTGAAACCTCACTTTTTGGCTTGCTTTTCAGCCAGTTCTTGTGTGTTATTTTCTTTATAAGCGGCGGCAT 

9921 „ -- 1O0OC 

TGTTATCTTCGACTTTGGAGTGAAAAACCGAACGAAAAGTCCGTCAAGAACACACAATAAAAGAAATATTCGCCGCCGTA 

0 * KLKPHflACFSASSCVLFSl* A A A W - 
N R P • f* I 7 F W l A f 0 P v L \ C Y F L i K R R h 
TIEALT^LrCLLFSprLrvi FFISGGH 

GGAGTTTATATTAAACATTCTGTATGAGTCATATCAATATTTACCACCAGGGCGTACTTTATTATTTGACCAGCAATTTT 

10001 " " 1008C 

CCTCAAATATAATTTGTAAGACATACTCAGTATAGTTATAAATGGTGCTCCCGCATGAAATAATAAACTGGTCCTTAAAA 

SLY - T F C M S H 1 N I * H 0 G V L Y Y L T S N r 

g v t i ¥ h s v ■ visir?TRArr-i I • p a i r - 

C F J L u i LY CSYOYLPI'CP TL L F D 0 0 F L • 

TAAAATATATCCAGGCAGAGTGGAGAACGCTTTATCAATTATGTATCASC7TCTCTCTTCCTGCCATAATATGTATGGTA 
1008J - - - j0i6C 

A TT T T AT AT AG GTCCCTCTCACCTCTTGCG AAAT ACT T AAT ACAT AC T C G AAC AG AGAAGG ACGGT AT TAT AC AT ACC AT 

* n rSRO SGERFiKYV'SAELFLP - Y v w y - 
K t vrGRVENALSIHYOLLSSCMNMTGI- 
KKSOAEWRTLYOLCISrSLPAIICMV 

TTAGCCGATCTGGCTTTAGGTCTTTTAAATCGG7CGGCACAACAAT7GAA.TCTGTTTTTCTTCTCAATCCCCCTCAAAAG 

l0: <: 

AATCGGrrAGfCCGWTCCAGAAAATTTAGCCAGCCGTCTTGTTAACTTACACAAAAAGAAGAGTTACGGCCACTTTTC 

» : v, i - v r • j »*» 6 m n n • m r r s s o c r s k v . 

i" •« * C T r S V T 7 I * v f L L W A A Q 

L A I) i. \ L G L L M R S A 0 0 L « » F I F S M P L \ S • 

Figure 11 



60 



EP 0 889 120 A1 



TATATTCCTTCTACTCACCYCCTCATCTCATTCCCTTATCCTCTTCATCACTATTTCCTTC^CCCAT^TTTTATAT 

10241 - - , 0 »0 

ATATAACCAAGATCACTGCRCCACTACACTAACCCAATACGAGAACTAGTCATAAACCAACTTTCCCTATTTAAAATATA 

Yw rr-ft|>OLlPLCSSSLFG-KR'! LY 
' I C S T D ■ LI Sr?YAtHHrtVCSOKf? | 
I I V L L T "» • SMSLMLFITIWIKAINTI r - 

TT ATCT AAAAGACTGG TTTCCATCTC T ATG AGCGAGAAAACAGAACAGCCT AGAG AAAAGAAATTACG TC ATC GCCGT^ 

10,00 

AA7 AGATTT7CTG ACCAAAGGTACACAT ACTCCCT CT7T7GTCTTG7CGGATGTCTTTT CT T7 AATCC ACT AC CC GCATT 

end yscT' start yscU* 

^SKRLVSIC H. S f K T r 0PTEKKLR0GRK- 
T L K D W r o g y » ARKONSLOKRwrVM^.VR- 
l"KTCrHLYC«CNftTAYRKEIT*Kp* 

GGAACGGGAGGTTGTCAAAAGTAT7GAWAAC*7CATTATT^ t 7TCTT TA 

10401 — . ,0480 

CCTTCtCGTCCAACACTTrTCATAACTTTATTGTAGTAATAAAGTCGA^ 

CGQVVXSI £ ITSLFQLIALYL YfMFrT- 
KGRLSKVLK*HHYrS*LRriC ir;S L 
G R A C C Q K Y • w r: I I I 5 A O C A L TVFS FLY 

CTGAAAAGATGATTTTGATACTGATTGAGTCA^TAACTTTCACATTACAATTAGTAAATAAACCAT7TTCTTi?u^ATTH 

10481 - -- — — ----- : ,o*6o 

GACTTTTCTACTAAAACrATGACTAACTCAGTTATTGAAAGTGTAATGTTAATCATTTATTTCCTAAAAG^? ^CGTAAT 

t K M 1 LILItSITFTLQLVNK PFSY-L 
L K R • F - Y L S Q - LSHtN - • I W H F L M • - 

• K00FDTO - V r* N F H I T 1 S K • T I F !_ C ] N - 

ACGCAATTG AGTC ATG CTT T AAT AC AGT CACTGACTTCT GCACT GCTGTTTCTGGGCGCTGGGGYAAT AGTTCCT ACTGT 

l0b61 ~ - -V I06 , 0 

TGCCTTAACTCAGTACCAAATTATCTCAGTGACTGAAGACGTGACGACAAAGACCCGCGACCCCATTATCAACCATCACA 

TQLSHALJ cSLTSALLFLGAG^l v A | v - 
Rw * VML** S H • LLHCCFWALG- - L L L W - 
A JESCFNR\'TDFCTAVSGRWCNS?VC 

CGGTAGCGTGTTTCTTCAGGTGGGGGTGCTTATTCCCAGCAAGGCCftTTCGTTTTAAAAGCGAGCATATAi^TCCGGTAA 
10641 ~ - !0 7?O 

ccca7cgcacaaagaag7ccacccccaccaa7a=cggtcgttccggtaaccaaaattt^^ 

0 S v " 1- 0 V G V v I i. S K A 1 G K S E h ; i. : 5 _ 

v '*CrrRWGWLLFARPLVLKASl- :„ 
' * V S S G G G Y C 0 0 G H W r • K R A. i K "f - k - . 

GTr>.ATTTTA.AGCAGATATTCTCTT7 ACATAGCGTAGTAGAATTATGTAAATCCACCCTAAAAGTTATCATGCTi TCTCT"' 
10 - 4 i • 10800 

CATTAAAATTCGTCTATAAGAGAAATCTATCGCATCATCnAATACATTTAGGTCGGATTTTCAATACTACGATAGAGAA 

,v rKvirSLHSVVELCKS£LK.| M < 
V! LSRVSLY IA - • N Y V N F A * KLJC ' L- 

* P I L F 7 • HSR I M • 1 0 P K S "h : ; " S , . 

ATCTTTGCCTTTTTCTTTTATTATTATGCCAGTS.CTTTTCGGGCGCTACCCTACTCTCCCTT^CCTGTGC-G'-CTTr' 

10801 - - - , " " , A , 0j 

. JOdflC 

TAr.AAACCCAAAAACAAAATAATAATACGGTCATGAAAACCCCGCGATCGCATGACACCC^TCGGACACCCCACGAACA 

: r ^ r r f r r i a s : r r a l p r c g l ■■ c - . i ■. 

SL r TS F ! I « PV LTCfl Y At v G • ? v A C !. W 
LCLfLLLLCOl f S G A T V L W v i L w ?r a "r 

GCTTTCTTCTT7AATAAAATGGTTATCCGTAGGGGTGATGGTTTTTTATATCG7CGTTGGCA7ACTGCAC7ATT^TT-T- 

10881 : I096i . 

CCAAAGAAGAAAT7A777TACCAA7ACCCA7CCCCACTACCAAAAAA7A7AGCAClCAACCC7ATGACCTGATVCA/\^0 

V S S I K K L w v G . H v r Y 1 v v G I L t> J 5 ' « 

F L !. * • n G Y 3 • '.: • w F F I S J L A I w t ! * ' ! 

r. f r f n m v k r, r :. n g r l y *? r w h t " c. - -■ 
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10961 ^ TA ™ T ™° ATTAC,VWOTATCTAA ^ 

TTATAATATTC7AATC7TTTCGATACATTTTTACTCATT TCTACTGCATTTTG7CCTCGTATT7C7 ACACCTCCCGCTGC * * 0< ° 



T r K 1 * *' I ■ K • V K M T ' NRSIK1WRAT 

14 m 1 1 R _ L * LSKNC - K * R K T G A • R S C G R P - 

P - 



1 L * °* K S 7LKMSK0DVK0CMKOLC C O 



Tn insertion P12Fb 

c 

II 04 I CTCAMTGMCACGCCGCCTCM ^ T ^^ GTG ^^ 

CAGTTTACTTC7CCGCCGCAGCCTTTACGTCTCACTTTATGTTTC^ ' ' ' ? ° 

LK " » « 5 V C N A £ •. HTKWtrsSIC - T I C C - 

S N f. daa.sehqse I o S G s laosvkqsva- 
Qhktrrrxcrvkykvgv* LNLLNNLL 

u m cc aacgcat at tgcggtttgtcttg gctatcat cccacccatatgccaataccacgcctcctggaa 

gccatcacgcattaggttgcgtataacgccaaacagaaccgatagtag 1 1 200 

GSA ' Sn **cglswlsshryanttr PG k- 

VVRw »'TH 1 AVCLGYHPTOMP|prvlE - 

R ' Cv *OaiL»rvLAliPPic<?yMASHK- 
i -o: , AAACGCAGTGATCCTCAACCTA ^^ 

tttctctcactacgacttcgattgatataacaattctagccacttgccttgacgtagg^m^^ 1 1 230 

R O 'CSS-LYC-HR'TQLHPRC-KC-A - 
KGSOAQANY iVMIAERNCI PVVCNV£ L . 

kav ^lklt:lltslnataspllkmlsw- 
ccgggcgagtaataaaaa^ttcaccttgc<^ctctattttam 1 1 360 

G A P R ! . 'r \ S C 7 R " * N 5 * N V 1 * T R C 5 ^ T - 

P A h v - ■ ' V ' R ° D K 1 P t 7 L F E PVAALLR- 
P A H Y . t K IV N A E I KFLK RV Ln P L 0 P C Y 

CATACCACTACTTCTATCTMTACGM^ 1 ' ' * ° 

end yscU* 

\ °V C „ \ P , 1 C - A Y R " T 1 M K r " M L L 0 A T A ► - 

D 1 ^ H y T r T o - MLLVCrrRPLR 

V * " S * I M R ' - rKHHK"FW| ASSGlfCE 

1 |4 4 J A ^^_ GAGGGT ^ 

tccwtc7cccattatcgca7atc7cctcacgaactcctat^ 1 1 " ° 

V K p ' : AYRAVLDDKCCKLKI I A T S L A 

' ' ' -OC LT I Kvr D" K • s L U A w H - 

c E L N 5 v * s s a - r • p . i T r. N N R f • p c 7 . 

] i . C ^ GCAC _ C _ A . 0A7ACCG ^ 

gttcgtggtctatcgcat^tattttaattw^^ 1 1 ° oc 

CAr -^ v i--'- , 'TR.WiCAStWTRTT 



R P p 

L N & i r oi 

5 T R ■ > I . I K L N K I M 0 W C V 



K H _ 0 : * v > k i koonglvrlng l e p l o p h 



MDSNHS7P 

I 1 60) ACGA / G ^ 

TCGTACAGT7CCACCACCAGATTCC7TCACTCGATACTTCCCGTT^ 1 ' "° 

P C 0 C A :. " r; • *MNG«VV'. i. NGOEV - r h 

T h S a - c s m o l ? r l > n « c , . 0 R c R , L S A . 

Figure 21 
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AC AACCGCAAT G AGGC AAGAGGG AAATCGCAATTTTCTT CCT GAAAT CACC TGATTGCC GTCGAAATATGCAACATCT CC 
TCTT GCCCT TACT CCC TTCTCCCTT TACCCTTAAAAC AAGCACT T TACTCC ACT AACCCCACC TTTAT ACCTTCT ACACC 



WRN C ARGKSOrsS* W ML1AVCICMMS 
TTAHRQCGNRHrLPr. IT* LRWKYATCft 
0 P 0 * G K ft r. I A I rrLKSPOCCGNMQHVE 



AGAAAATAGCCGCCATCCGACGGCrATCGTCGTATTATCGGAGCGCCCTGCAAAATGATC^CGGACGGCTGACCTTGTAG 

* ■* — ♦ — -— - - - « 

TCTTTTATCGGCGGTACGCTOXGATAGCAGCATAATACC^ 



* •< ' PPCDGYRR I 1GARCKMHA0G - ft C R 
E "S»MATAIVVISCRAAK« WRTADVVD 
* I ^^MRRLSSYTRSALOWOGGRL.TL* 



ATAGCCCATCCGTAGCATCArTAACACCGCCGCCGAGOT 

t~tcgcctac^catcgtagtaattgtcg^gcggctco^gt^^ 



* irsi I mtaaevr pmmmpi ok p a g p i . 

.- : -S - SLTPPPRSGR* ' TPSR5LPVP 

I A M ? « HH. H p RRG Q ADor pH p CAC|tSH . 

tacgatccaccaccaaatccgttaacgccagg^tataaccgc^ 

1192s - - ■» -----••-------.-»_«.. -.-.«__4._. ^ 

ATGCTAGGTC^TGGTTTAGGCAATTGCGCTCCTATA^ 

,STT ^SVN ARl-PLGKPNTQ'AVK yj 
: ^ P ??NPLTPGYNRWVNLTPSRR« 5-. - 
7 1 ''HQI ft R<?0! TAG* T HPVGGKGDK- 

1200] AA ^ GATGCAACGCCTATCm 

TTTTTCTACCTTGCGCATAGAAATArTG(X<X:CTCTTATGGCGACGGCT ATTGGAC ATATCTCAGCCTTTAGACCATTTG 

KKMC :;v SL - P RRIPLPITCIESClw- j 
K R k n A Y LYHRACT RCR* P V - SRKSGKp- 
K " 0 Q , ? I riTAQNTAADNLYRVCWLVN 

1208. CGCA0CG tGCAGCATT WTTGCGGCAAGCCCCAC6ACCTCAG CCTTGTCATTCT AGAGCAAAGCAATATGCTTACGCAGA 

GCGTCGCTCGTCGT^TTAACGCCGTTCGCGGTGCTCGACTCCCAACAGTAACATCTCCTTTCCTTATACGAATGCGTCT 1 ? ' 6<> 

* A S S ;NCGKRHDLRVV| VtQSNHLTOS 
^ R -" t!A ASATTSGLSL-SKA!C!.^« 
S £ v - • LROAPRPOGr M CKAKOrA^A<- 
! ? i 6 C7AACCCT W ^ TAGCGGT A^CCACACCCATACAMTCCCGACGCCTAAACCGGTACGCGCTGGGTTTGCGCATCCAGC 

CATTGCCATTTTTATCGCCATTGGTGTCGGTATGTTTACGGCTGCGGATT^GGCCATGCGCGACGCAAACGCGTAGGTCC * ? * * 0 

N G K S G W KSHTNAOA* TGTRCVCA«:s 

V 7 V 1 ; A V T T A 1 0 M P T P K P V R A A T A H * . 

" * ' ' K ' POPTKCH«LNRYALRLRlQR. 

122 1 CTTGACCC " TCCCCCAGACCGATAACCCACTCGAATCCTTAC ^^ 

CAACTCGGGACCGGGTCTCGCTATTGGGTGAGCTTAGCAATGGCGGCCTCGGTCGCTGTAGCCGTCATGCTTCTACTCGC ' ' ^ 

v ^ pw :> «PITHSMflYRRS0RhROyCMOR 
1 5 f G ? 0 R • PTRlVTAAASOIGSTwjsj;. 
*l a CTDMp|. CSLPP0PA7SAVPT5A 
I 232 AOTAAACTT ^ GCCCAATCTGATC ^ 

TCGATTTCA^TTCCCGTTAGACTACTGGCCGCTGTAGGTGTTATGGATTACCGCTTTCGTCGTCGCGTTGCTGGCCTTTA ? ^ 
AKVRRMLHTGDI HMT" WRM0^ RNDfi|< . 

!-«LiA!--PATSTIfwGt7S5ATlAH " 
5 * S " -OSOORRHPOTIHAKPAAORDQ, 
1 240 AACCTCACTTC --^ G ^ CAGCCACCCCAATCGCCAACrc 

. _ «, . * ** * * " ♦ • - ] > 4 8 (• 

TTGCAGTGA^GTTTCTTGTCGGTCGCGTTAGCCGTTCGGGTCAACTT AGTCCGCGAACTACTGCTGCGATAGCCCAAACG 

S " I* » LOPAOSATI'VCSGAS - RRyRvc 
NVTS - N iOPNROPyLf. OALHOOAlGr A . 
Ts LO"TASA!GMPS* I RR TmYTLSG l P - 

Figure 11 
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C^AAT.CCTTTTTCATTACCAATATCACCCATTCAACCCGCCTCTTTAATCTAjftCAAAGCATGCCCAT^AACATCACCCAA 
CTTTCGCAAA^AGTAATCCTTATAGTCCGTAACTTCCG^ 

OSL f'HVCVNALNARV* CKKAWR* T S P n - 
K A r TITN I T K * TRV F W V R K HGOKKMP | . 
KPrSL«ISRIGRACLM'ESMAINrTC 

tacacccccgcagtcgcaacgccgcagccgataccgccgacttccggcataccaaaatggccatagataaaaatatagtt 

- • — — .76,0 

atctggcggcgtcagcgttgcggcgtcggctatggcggctcaaggccgtatggttttaccggtatctatttttatatcaa 
r ppo/sqrrsrt rrvpayqngmr* k y s s - 

ORRSRNAAAOTACrRKTKHAIOKHlV 
* TAAVATPQP1PPSSG1PKWP* I K I • r - 

CACCGGAATATTCACCAGCJ^GGCCCAAAAATCCCATCACC^^ 

GTGGCCTTATAACTGGTCGTCCGGCTTTTTAGCKTAGTGGTATGG^ 

PC YSPAG PKI PSPYPVWFWPOLRTGr 
HRw I HQQAO^SHHHTR rGFGOTfALVS 
TG ! r T S R PKMPITI P G L V L A R P 5 H W ~ R * 
GCGCTACCTGAA^GAAAAGGTATCCTGCGCCCCACAGCAGCGCGCGAAGATAACCCACGGCTTTATCGGCCAGCGCCGGA 

— * 17600 

Ct^GATGGACTTTCTTTTCCATAGGACGCCGGGTCTCGTCGCGCGCTTCTATTGGGTGCCGAAATAGCCGGTCGCGGCCT 

ALPtRK Gl LRP1AARCDNPRLYRPAPD 
RY iKrKVSCAPOOftAK] THGflGQfifi • . 
AT* h K R Y PAPMSSARR* PTALSASAG 

T CAA T ^.7T ATGC AT AG AGCGG AT AATGT ATCC GGC ATTC CACAG GAC G ATC ATC AC CAGCACG GACAC AAAGCC CGCC * v» 

U*0* 1. |» ? 80 

AG7TATAATACGTATCTCGCCTATTACATAGGCCGTAAGGTGTCCTGCTAGTAGTGGTCGTGCCTCTGTTTCGGGCGGTC 

QT - A' SG'CIRHSTGRSSPARROSPPA- 
NlMHRAONVSGIPODOHKgHCOKARO 
SI LCI lRIMYPA THRTI ITSTCTKPAS- 

CCAGA.^CLTTTGTCGAACCTGATGCGCGATACC^TCACGACBGCCGGAGCCATTGAGTTGCGCAATCACAGGCGTCA.^CC 

GGTCTTGGGAACACCTTGGACTACGCGCTATGCGAGTGCTGCCGGCCTCGGTAACTCAACGCGTTAGTGTCCGCAGTTCC 

R T LvCPOAfl YAHOGRSH* v A Q S Q A S - 
PCPLSNLNRDTLTTAGAI CLRnHRRc-;,- 

0 f* c R T • CA:R5RPPCPLSCA|TGVKA 
CCAGCAGTAAGCCGTGACCAAACAAAATGGCGGGAAGCAGATAGAGGTGCCGATAGCGA.CGGCAGCCATGTCCGT*G r G'" 

l?961 - - - l»K 

GGTCGTCATTCGGCACTGGTTTGTTTTACCC^rCCTTCGTCTATCTCCACGGCTATCGCTGCCGTCGGTACAGCCA.TCGC ^ 

PA\'3C00TKWRCA0RGADSDGSHVR j i . 
00* AVTKONGG KOIEVPIATSvAMSV'i,. 
% S r. r ■ PNKMAGSR* R C R * PROPCP - ; 

TATAGCCTCCCGCCATGACGGTATCGACGAATCCATTGCGGTCTATACCACTTGCGCAAGCATCACCGGT'VTCTGAArc- 

l30Jl " — iu?o 

ATATCGCAGGGCGGTACTGCCATACCTGCTTAGGTAA.CGCCAGATATGGTGAACGCGTTCCTAGTGGCCATAGACTTGCC 

1 A S R K 0 G 1 DESfAVYTTCAR ' T G I ■ T L • 

PkAHTVSTMPLRSIPLA<?GSP\/Scf* 
TSLPP* RYRR lHCGLYHJ.HKOHRri.NA- 

TAATAACTGACCCGCTTCACTGCTATACTTCTGCACGTATTCACCTTTTATTTTCTTCTTATATGAAAGACTAAAAAGCC 
i 31 ? I , „ 

A ^ATTGACTGCGCGAAGTGACCATATGAAGACGTGCAT AAGTGGAAAATAAAACAACAATATACT 

'TOALHWyTSARjMLi. rcCTMKD - KA 
* * L T R f T G I LLHVFT!"I I" v v | • KTKK.F- 

N K * 9 h ' - v * r c r s p r : t m i i * c ? l k " s u . 

^ OCCGAAGTGGCAGCCAAAAGAAATAGCAGGGGAAATTTCAGTCTATTGTAGCGGGGTATTACTATTTCTCCAGTGAAAAA 

ccgcttcaccgtcggttttctttatcgtcccctttaajAgtcagataacatcgccccataatgataaacagctcacttttt l378C 

A tVAA >HW5MCKrSLL - RGI T ! S ' P V K f 
P K W Q P t i AGG I SVYCSGvl L I* L 0 ' K h 

" S C S O k k • QGktosIvaCt » » I S S £ * 

Figure 11 



64 



EP 0 889 120 A1 



l 32B1 A ^ A ^ TT ^* TT '' VA ^CC^CATTCCTGCCAACCTCTTTTTCCACCTCCT ATTCTCCTGAACACTTCTCCTTTTATTTATTTCA 

TCTCAACAATTCCCGCtrTAACCACCCTTCGACAAAAAWTCCACCATAACACCACTTtJTCAAGACCAAAAT 1 

0LLTAM CW O AvrpPAlVLHSSAri y f R - 
SC * RR »ACKL{ rHLLLC*TVLLLr«S 
TVVNCALLASCrSTCYCAEOrcrr LFO- 



( CCACTTCAAGATATGTTTACG<^ATCGTACAGCGTArCGCGAAAC7CGTATCGATA 

CCTO^TTCT-ATACAAATGCCCrTA^TCTCCCATCCXXCT^ 

S * ^YVYGORTGYft ETGID 
CveOMrTGlVOCTAKLVS! 
CLK 'CLRGSTRVPRNWYR 



Figure 11 
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DNA sequence of VGC II cluster C 

Tn insertion P9B< 
8 

CCTAGGAAAAAGAAATT ACGACGA TTGCAAAGAACGTTT TACGCAACTACT CT AACTAGG TCATCTGCTCACTATTCTT T ^ 

Tn insertion P7A3 

TCTCGOlGCGTAACCGWWTKKWAHVW^ 

1 6 1 

TCTGACTACTCTCACTGATTA(^CGGTCACGTTATTGCCCCCTTATACACG^ * 0 

Tn insertion p^ft 
0 

GATCCATTTGTATATCATt^TGAATTAACACGCTCCCCGGCCCTT 



24 1 1 1 ■ a^/^ul i uti.uut^CCTTCGCTGGATACTrCAGGKTNSSGCTAACCCA7TTTT 

CTAGGTAAACATATAGTAXrrACTTAATTGTGCGAGGGGCCGGGAAGCGACCTATGAAGTCCT 
ATCPA^ 



321 



TAGTTTTGTAGGACGTGAAGAGCATGGrrATTCAGTACTGTCTAATGTCGTAGGGCT^ 4 °° 
AGTCGCTCTCACCTT^ 



401 ___________ 

tcagcgag.^tggaaaacgtagacaagcgaactgctcgttattggcctg 4 80 

48 1 f^l^ 

GGGCGTGTATTACTTATAACCyVAAACAGATTATTTTTGAA^ 550 
S6 1 "j? J^fT^ 

GATTATGAATTGTCCTGTGGGTAAGGTGGCTACTTTTAGTTCTTATGC 6 <° 
64 1 - C -^ T _^ 

CCCATAAACTATTAGTCCTTC^ 

TACCATCGCCGAATAAACCTAATTTAGACGCCGCTAATTGAGATTGAGACCGAAA 3 °° 
CGGACAAGAGAGTCTTATTAAAAAAGTAAATATCGGTCGCTTATGTTTATAGCGT^ 880 

ee i 5^7_ c . A 77^f5/;\\^ 

GGAAGTAAGTCGGTATGAAGGGCCGGAACATTTTGCACTGGATTTTTTGCATAAAAG 9$ ° 



TTGGTA 

9 S i _ C .^_ A 7. A ^^^^^ 



CTCTATACGCT^.T^TC^TGACTCCC^TTAGTTTTTTTCCCACTAATGTGATACATGAAaC^ 

1041 



1040 



AACAAATGA^^ 



TTCTTTACTGGATGTTGTCCTTATAGCCGGTTATTTCCCTAAAACAAAACCCTCACCT^ 1 1 ? ° 

1121 A??!?^ 

■ AGCGTCGTTA/^TCTGAAACGGGCCCCTTATTACACCGAACCCTTNGYTAAAG^ 1 ?0 ° 

CANAACCRAA^NiATAGTGAAAGW »™0 
1281 ^ C ^ C ^ 

TTC ^GCTCCA C .7TCCGCW^ ' 560 

s equence 2 Fi?urc X2 
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TATACAAAGTAATATTTTTATCCACCACACCCTCCATATTATTTAAAGTCACCACAGATCCCTCGCAAACTACATAACCC 
1361 ♦ * • ♦ » 1440 

ATATCTTTCATTATAAAAATACGTCGTGTCGCACCTATAATAAATTTCAGTGCTCTCTACCCACCCTTTCATCTATTCGC 

TGACAGCTTTTTTCCAGGGCATTCAGACGCACCATWGTTTGAGGTATCGCTGATTACCGTTGA/INAACCACTAGCACC 

14,1 ' " " - — - ,S20 

ACTCTCGAAAAAAGGTCCCCTAAGTCTGCGTGGTATTTCAAACTCCATACCGACTAATGGCAACTKNTTGGTGATCGTCG 

ACCGTCAT TC AAACCTGTATTGAACGCAATTTTCTT CCCACCCAGCGACACTG CCGTT CCCCAGT CG ATGOCTAACTGGT 

1521 * * * * * ♦ ♦ -----4 ifiOO 

TGGCAGTAAGTTTGGACATAACTTGCGTTAAAAGAACGGTGGGTCCCTGTGACCGCAAGGGGTCAGCTACGGATTGACCA 

TAATATCTCCAGCATTAACATCGATAATTTTCACCCAAATCTCTATCATCTGCTGGCGTTGATCTAATTCTGTGATGAGT 
1601 ♦ + ,, ...... 4 4 16fl0 

AT T AT AG AGG T CG T AATT GT AGCT ATT AAAAG T G GCTTT AG AG ATAGT AGA CGACCGC AACTAGA TT AAG AC AC T ACTC A 

TTCCGATACNNNGCCATATTGGNNNCATAATCACGAACX5ATCACTGCATT 

♦ ♦ , ♦ — # j l60 

A^GCTATGWNNCGGTATAACt^NNCTArTAGTGCT^ 

n 6 1 ATGCCTCTGTAGCGGCTGAACGATTC ^ C ^ 

TACGGACACATCGCCCACTTGGTAAG*^ 1 6 * ° 

1841 ACGCCT ^ NNAACCACGACGGACTGAT 

TGGGGACCNNTTGGTGCTGCCTGACTAGCGC7 ATAACCATGACCCATAGGT AGCGTCACCGTATGAATTCGCACATATAT 

CTTACACTCACCGCACTGTCTTTTCGTTTGATTAACGCATTATCCAGCACTGAAGCTAATTGACTA^ 

GAATGTGACTGGCGTGACAGAAAAGCAAACTAATTCCGTAATAGGTCGTGACTTCGATTAACTGATTATGCT 

Tn insercion P?G2 
0 

200 1 GCTGGGAAC * CCCCT ^ CCTC ^^^ 

cgacccttgtgccgagtggaggtgtcgaaaccatggccattaaagaaattggagcgtag<;gccactactttcct 2080 

2oa i GGCTGCGTAAGTAATGAATGAArc ^ 

ccgacgcattcattacttacitggc^gtcatctatt^ 2160 

2161 TA7 _ ACATAT ^ CATGCTCCCATCAAAC ^ CGTAAG 

atatgtatattgtacgacggtagtttggtccattcctttagtataacacgaccctccaataacttttatacctggccacc 2240 

??4| TCCAGCCCGAATTT ^ CCAC7AAATGTAGCTGTTATCAA ^^ 

acgtccgccttaaaaacgtcatttacatcgacaatagttaccccat^^ 2 ' ? ° 

2 321 GATGTKAA ^ CCTCTCCTAATGGC ^ 

ctacawttttggagacgattaccgtaaacagaccgtatttcccacttcagtaatggaaaggtactattgagtagtgagaa 2 1 00 



2 4 01 TGCTCTATTGAGTATAAATACTAA ^ 

acgacataactcatatttatcattttaattct aatttgcaaataaatgatggt aaaatatgggctggccttatttcaaat 2 " 6 ° 

24 81 TGCTGATTGCGTATTACA "^^ NAAAATGCAAGTTAAA ^ 

accactaacgcataatctaaaaaanttttacgttcaatttcggtccacaaaaagatagagttatc 2 bo u 

TACTACTTGTGCTA7AATAACCCTTTAACCATCCCCCATCCGCTGTGAGCTGTATACCATAATCATGCACGTCCGGGTCT 
2S61 ----------- — ----.-—#-........».......__ f . ^ 

ATGATGAACACCATATTATTGGCAAA7TCGTAGGCGCTAGGCGACACTCGACATATCGTATT AGTACCTGCAGGCCCACA ' " ' " 

Tn insertion P11B9 
(J 

?64| GCGCAA * CRCTAGTG7CAW ™^ 

CCCCTTtGYCATCACAGTKKATCCGTTCTGTTCCGAATCCATTCGAAAGGTCCAGTAAATTCTTGTTTCT^AT^TTTA ? ' ?? 
2*>21 GCTTCTCAGAAAATTTCTYCV8HMWWMMNMNWNHNNNWNNNNNNNNNN ^^ 

CGAACACTCTTTTAAAGAWJRVONMMMMMNNnnnnnn^^ ?80 ° 

Figure 12 



67 



EP 0 889 120 A1 



2801 jll r A 5 A S fJ^ 

N RRKS SSG RS WMTAKR R S WYWW AATTACCTTACG CAAAATT TTCACCC TCC TACTTACGCA^/WTCTCT AT^ ACCCTC " 8 ° 

; 8 e j - T I T -T' A J. C ^^^ 

AAACATACTTTAAGCGACTGTTCGTGTAGGCATTTTTCCGACTAACTCTAAATA^ 2960 

2961 

AGTCCTACCACACATCTATATCGAACACTCTCCGCTAACATAG^ 3040 
304 1 -^I^^ 

TOGTAACTGTATTTTTCAATGTTAAACKTTTTAATAAATAA^ 3120 

5121 -"^^11^ 

T^WAACATCACTAGCTW^ 320 <> 

32oi -~?!!!^ 

ctactttgatatacatgacgctatcactagttcacggV^ 3280 

2281 

SSACT TAG AG TTCTT ATG S S R S R Y N NN NN N W AAATCATT ACT CCG ATTGAAAAAATAAAA^ " 6 ° 

nsi -^S^^ 

^acgatacacacgmtwcgt^ 34 < 0 



ataggcwacttatmcatgattcgttagttgccaaacttcttcgacttgca^ 3520 



3521 -—^J!^^ 

AGTAATCGCTGACTCTAAGTAGTATTCCTATAAAAGGGACTCCACTC^GCC 

3601 M 5A C ^ 

TATCA 



Tn insertion ?3F4 
II 



AAATAAGAATTAGTATTTTTACTCTAAAGCA*™^^ »'« 

ccctttttcgaaaccaaatatccgttggctaccccccaa1aVc*a^c"a^VccV " : • 

TTGCGAAGGCAACGCCAACCCCTACTCCcVrCcVcVT^^CTc"^ J ' :0 

CGGTGGACGGTAATCTACTATCATAAGCT<^TACCGACCTAGT^TGTTG^ 

TTTmTCCATGTGTCAATCTTTTACATTGCGAVCT^CCTACC^ 

gcaacgtaccggggcctacctcagaccaatgcmVaVgccVaVgccaVtac^ 

416! - C -'!^l^™^ 

c ™"tagg(^tctmttgtcgtaa*c«c^ ,?j 



0 
CTT 



Figure 12 
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4 ? 4 1 _ A _ CCCTTAT ^ CCT ^ GT CGATCTCATTAA^ 

T <^CAATACCGCAAAACAGCTACAGTAATTATTTTCGCGTTGACCT <J? ° 
4j?l - A _ T .°^™ G J* T ^ 

TACTT AATCTATCATAACGCCCACGAAAATTGCTTCACCAACTATCACATCTTCACGTTATCCT^ * * °° 

4401 - C I C -?^^ 

CAGCCTCTCCGTOKJTCCGCGATTTACTTCGTTTTTrrGCGCGACTCG^ M8 ° 

4 4 61 - C - T ^™ G l CATC ^^ ACGT 

CATTATTCAGTACTCAATGCATGAGGCTACTTACCGCATGAGCCACGTTAACTrAATM 4 " 0 

4561 -'^^^i^ 

TCTCGTTCTTCCTAATCGACTATGCCGCTCmAACATGTGACAGAM 46 '° 

4 64 1 ^™l C ^." CTCGT W 

CTGCCTACCTCAGACCACTAAAGTCTAATGTATACCTTCTTTGTCGCAATG ° ? ° 

4121 - A - T - CCA - CG f. G ^ 

TAGGTCCCCGGTCGCGNTTTCGTTTTTTGACAGTAATGCATGAAAACAGCGAG^GTAC^ ' 8 °° 

WCATAGGCAAATCNNCTTTAAAACC^ 4880 
4861 .^^^1^ 

CrcAGTTCKAGTACTCCTTGTTAArTATAAAGACCMTCGCTATrc ' 9 *° 

4 961 

TAGAAATCACGAAAMTAGTTCCTCTGTGTTTAAGC^^ 504 0 

S04 1 -^J^^ 

TTACTACTCGCCATTAGACTGTGATTTTTCACAGGGGCCCC 51 20 

Tn insertion 

a 



TGGTCGGCGGAGTTGGTTAATTTCCCTGCGACAGTCNWNGGCAAGACGGACGTAGCCGTTG^ "° r ' 

GCCACTTGGTGGGGTGGTCGTTTTACGCGAACAGTTNNGNTCTCGAAAACAT " 8 ° 

C ™TCAATTATAACACATGTCCTTTATACGGT^ " 6 ° 

sj6i 

ACTACTACCGCTATAATTACCCCTATAGT^ * 4 ° 

54 4 , 

CATTGCTCCGAGACTGAMTACTGTTGTCGTCGCTAAGCTAMTCATcV^ " 2 ° 
557, .^^""A^ 

taacttacacatgctaataccgtactactcggcttattaaatctaggactgacgtacaa^ S60 ° 

5601 AS ^^ A ^ 

TSCaNKTC^YWAKWAGYRRCAHHTTTTTTWKCWHDWACTADTRTNW^^ " 80 

Figure 12 
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TCCCT ACATC AC TAT TGCCGCAGAATACCAACTTTT ACGAAATATAGAGCTACAGGAGCAGG ATCC 

S6BI — S, 46 

ACCGATCTAGTCATAACGCCGTCTrATGCTTGAAAATGCTTTATATCTC(^TCTCCTCCTCCTAGC 



Figure 12 
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